We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Multi-allelic sites in VQSR

KousikKousik GermanyMember

I was wondering how GATK VQSR deals with multi-allelic sites.
I already know that -
i) VQSR treats them same way as bi-allelic sites (https://gatkforums.broadinstitute.org/gatk/discussion/7754/how-vqsr-deals-with-multiallelic-snps-and-indel)
ii) Split multi-allelic sites before VQSR (https://gatkforums.broadinstitute.org/gatk/discussion/23559/split-multiallelic-variants-before-vqsr-and-cnnscorevariants-gatk-team-opinion).
This mainly informs about mixed (SNP + INDEL) multi-allelic sites.

Summary questions:
1. Do you recommend split multi-allelic SNPs before VQSR? Will it be biased since site-level information/annotation would be multiple counted. I got different results in split and NOT split (performed relatively better)
2. If we don't split multi-allelic SNP sites then how Ti/Tv ratio is calculated.
For example:

chr1 123 A T,G
chr2 234 C *,A,T

In these above cases, which allele(s) is taken to calculate the Ti/Tv ratio in the tranche file. If VQSR takes the first allele then what to expect in 2nd case, where a star allele is at first position Or it is better to remove star alleles before VQSR?


Sign In or Register to comment.