This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
SelectVariants AF with multiallelic variants
I'm raising the ugly issue of multiallelic variants in filtering variants on allele frequency again. Previous discussions at https://gatkforums.broadinstitute.org/discussion/1545/arithmeticexception-in-variantfiltration-and-invalid-jexl-expression-detected-in-selectvariants and http://gatkforums.broadinstitute.org/discussion/5567/selectvariants-on-af
I have a vcf file that I have created by running GenometypeGVCFs on many (a couple of hundred) gvcf human whole exome files. I would like to filter it to include only those variants that have an allele frequency greater than 5%. This would normally be done using a JEXL filter like "AF > 0.05". However, this causes an error whenever multiallelic variants are found, because the AF field in the VariantContext object is not a Double, it is instead a List of Doubles. Therefore, the command line requires "--restrictAllelesTo BIALLELIC" to be added, so that only biallelic variants are processed. This is sub-optimal for me.
What I would like is for multiallelic variants to be treated independently. That is, if I have a variant like this:
1 878906 . CTTTTT CT,CTT,C 1509.89 . AC=19,16,3;AF=0.132,0.111,0.021;AN=144;...
This means that there are three separate variants on the same locus. Two of them have an allele frequency greater than 0.05 and one has an allele frequency less than 0.05. I would like to be able to filter the variants so that the output vcf file contains just the two qualifying alternate alleles, like this:
1 878906 . CTTTTT CT,CTT 1509.89 . AC=19,16;AF=0.132,0.111;AN=144;...
Are there any plans to enable this functionality?