To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Can't SelectVariants based on sample's GQ

averydavisaverydavis BostonMember, Broadie

I have consistently failed to SelectVariants from a gvcf containing only one sample with -select "GQ > 60.0" (or other variations, such as -select "GQ > 60"): the output vcf always ends up with a header but no entries, though there are many, many entries that meet this threshold. I've come up with a workaround (VariantsToTable, filtering with awk, then SelectVariands --keepIDs ), but it's memory inefficient and longterm I would like to be able to select sites (especially homozygous reference sites) based on GQ. This article about JEXL queries suggests what I've done should work fine (see section 4), but this forum post shows that others have had this problem a couple years ago.

I have been able to get SelectVariants to pull out homozygous reference calls by including -sn samplename -select 'vc.getGenotype("samplename").isHomRef()', but whenever I include any -select query based on GQ, the resulting file is blank (when this is included alone or in conjunction with other calls, including .isHomRef()).

Is there another trick I should try? Has anyone had success filtering on a sample's GQ? Other suggestions for pulling out high-quality homozygous reference sites from a gvcf containing data from just one sample?

Other information/detail that may or may not be relevant:
My gvcf is BP_RESOLUTION, not in block format.
I get no error messages from SelectVariants, just blank output.
No standard SelectVariants options give me trouble, just genotype-level filters (and not all of them - as I mentioned .isHomRef() works fine).

All thoughts welcome - thanks!



Sign In or Register to comment.