If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Filtering a variant on SOR that looks otherwise well supported
I need to apply hard filters to my data. In cases where I have lower coverage I plan to use the
Fisher Strand annotation, and in higher coverage variant calls,
SOR (using a JEXL expression to switch between them:
DP < 20 ? FS > 50.0 : SOR > 3).
The variant call below (some annotations snipped), which is from a genotyped gVCF from HaplotypeCaller (using a BQSR'ed BAM file), looks well supported (high
MQ0). However, there appears to be some strand bias (
788.77 . DP=34;FS=5.213;MQ=35.37;MQ0=0;QD=25.44;SOR=3.334 GT:AD:DP:GQ:PL 1/1:2,29:31:35:817,35,0
In this instance the filter example above would be applied.
Is this filtering out a true positive? And what kind of cut-offs should I be using for
The snipped annotations
BaseQRankSum=-0.8440 for this variant also indicate minor bias that the evidence to support this variant call also has some bias (the variant appears near the end of reads in low quality bases, compared to the reads supporting the reference allele).
This is part of a larger hard filter I'm applying to a set of genotyped gVCFs called from HaplotypeCaller.
HomRef positions using this JEXL filter:
vc.getGenotype("%sample%").isHomRef() ? ( vc.getGenotype("%sample%").getAD().size == 1 ? (DP < 10) : ( ((DP - MQ0) < 10) || ((MQ0 / (1.0 * DP)) >= 0.1) || MQRankSum > 3.2905 || ReadPosRankSum > 3.2905 || BaseQRankSum > 3.2905 ) ) : false
HomVar positions using this JEXL:
vc.getGenotype("%sample%").isHomVar() ? ( vc.getGenotype("%sample%").getAD().0 == 0 ? ( ((DP - MQ0) < 10) || ((MQ0 / (1.0 * DP)) >= 0.1) || QD < 5.0 || MQ < 30.0 ) : ( BaseQRankSum < -3.2905 || MQRankSum < -3.2905 || ReadPosRankSum < -3.2905 || (MQ0 / (1.0 * DP)) >= 0.1) || QD < 5.0 || (DP < 20 ? FS > 60.0 : SOR > 3.5) || MQ < 30.0 || QUAL < 100.0 ) ) : false
My goal is true positive variants only and I have high coverage data, so the filtering should be relatively stringent. Unfortunately I don't have a database I could use to apply VQSR, henceforth the comprehensive filtering strategy.