If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
How to mark zero depth reference bases as failing filter without using --missingValuesInExpressionsS
Hello- I apologize if this is a frequently asked question. I could not locate a similar example when browsing the forum. I hope this all makes sense. Thank you for your help.
For background, below is my VF command.
GenomeAnalysisTK.jar -T VariantFiltration -R genome.fasta -V raw.vcf -o filtered.Dels.5.MQ0.8.vcf --logging_level ERROR --filterExpression '((DP - MQ0) < 5) || ((MQ0 / (1.0 * DP)) >= 0.8) || (Dels > 0.5)' --filterName LowConfidence
My question is:
How should I run VF (or, UG followed by VF) to flag the following vcf line as a LowConfidence reference base (LowConfidence because there is no read coverage at this reference base) ? There are no annotations and therefore nothing that can be filtered. We would like to mark this reference base as LowConfidence in some way (or, if need be, in a second filterExpression/filterName to indicate zero coverage).
F11_mutated 1282158 . T . . PASS . GT ./.
I know that for this one example above I could set --missingValuesInExpressionsShouldEvaluateAsFailing as True so that it would then not be marked as PASS. But this will adversely impact other calls that are currently correct (vcf line below represents a real in-del event, so we want to see this as PASS. It does not have a Dels annotation (which I use in the filter expression) , and therefore if we applied -missingValuesInExpressionsShouldEvaluateAsFailing as True this would not PASS.
F11_mutated 208038 . TG T 4072 PASS AC=1;AF=1.00;AN=1;DP=94;FS=0.000;HaplotypeScore=48.2051;MLEAC=1;MLEAF=1.00;MQ= 57.76;MQ0=0;QD=43.32;SB=-1.589e+03 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:1,93:94:99:1:1.00:4072,0
(Not directly related to my question but in case helpful in giving context...The reason for using the Dels value in the filter is to mark as LowConfidence SNPs that are immediately adjacent to true positive in-dels, and where in-del realignment did not quite get everything right. Example below)
F11_mutated 985691 . A T 30.01 LowConfidence ABHom=0.667;AC=1;AF=1.00;AN=1;DP=76;Dels=0.96;FS=0.000;HaplotypeScore=58.8888;MLEAC=1;MLEAF=1.00;MQ=40.57;MQ0=0;OND=0.974;QD=0.39;SB=-2.404e+01 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:1,2:3:60:1:1.00:60,0