We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

SelectVariants AF with multiallelic variants

mnw21cammnw21cam Exeter UniversityMember ✭✭

Hi all.

I'm raising the ugly issue of multiallelic variants in filtering variants on allele frequency again. Previous discussions at https://gatkforums.broadinstitute.org/discussion/1545/arithmeticexception-in-variantfiltration-and-invalid-jexl-expression-detected-in-selectvariants and http://gatkforums.broadinstitute.org/discussion/5567/selectvariants-on-af

I have a vcf file that I have created by running GenometypeGVCFs on many (a couple of hundred) gvcf human whole exome files. I would like to filter it to include only those variants that have an allele frequency greater than 5%. This would normally be done using a JEXL filter like "AF > 0.05". However, this causes an error whenever multiallelic variants are found, because the AF field in the VariantContext object is not a Double, it is instead a List of Doubles. Therefore, the command line requires "--restrictAllelesTo BIALLELIC" to be added, so that only biallelic variants are processed. This is sub-optimal for me.

What I would like is for multiallelic variants to be treated independently. That is, if I have a variant like this:

1 878906 . CTTTTT CT,CTT,C 1509.89 . AC=19,16,3;AF=0.132,0.111,0.021;AN=144;...

This means that there are three separate variants on the same locus. Two of them have an allele frequency greater than 0.05 and one has an allele frequency less than 0.05. I would like to be able to filter the variants so that the output vcf file contains just the two qualifying alternate alleles, like this:

1 878906 . CTTTTT CT,CTT 1509.89 . AC=19,16;AF=0.132,0.111;AN=144;...

Are there any plans to enable this functionality?

Issue · Github
by Sheila

Issue Number
Last Updated
Closed By

Best Answer


  • mnw21cammnw21cam Exeter UniversityMember ✭✭

    Yes, that is what I want to be able to do. However, I have also looked at the SelectVariants --discordance particulars, and it appears that this is also not doing what we need for our pipeline in multiallelic situations. This is something that we will need implemented for us, so I expect I will probably have a crack at it early next year.

Sign In or Register to comment.