Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

GATK4 VariantFiltration is unable to tag the variants properly with --genotype-filter-expression

I am trying to filter variants based on FORMAT annotation: GQ < 20.

A couple of variants for only 5/95 samples from the input vcf (APOE.recode.genotypeRefined.vcf) are shown below:

chr19 44907654 rs769451 T G 1131.52 PASS AC=3;AF=0.016;AN=190;BaseQRankSum=-4.920e-01;DB;DP=2959;ExcessHet=3.0798;FS=0.788;InbreedingCoeff=-0.0160;MLEAC=3;MLEAF=0.016;MQ=59.93;MQRankSum=0.00;PG=0,16,39;POSITIVE_TRAIN_SITE;QD=12.04;ReadPosRankSum=0.00;SOR=0.582;VQSLOD=9.43;culprit=MQRankSum GT:AD:DP:GQ:PL:PP 0/0:37,0:37:99:0,99,1485:0,115,1524 0/0:35,0:35:99:0,99,1485:0,115,1524 0/0:27,0:27:85:0,69,1035:0,85,1074 0/0:37,0:37:99:0,99,1485:0,115,1524 0/0:25,0:25:82:0,66,990:0,82,1029

chr19 44908684 rs429358 T C 4672.69 PASS AC=24;AF=0.126;AN=190;BaseQRankSum=-2.469e+00;DB;DP=1958;ExcessHet=4.7065;FS=1.755;InbreedingCoeff=-0.0544;MLEAC=24;MLEAF=0.126;MQ=59.98;MQRankSum=0.00;PG=0,10,26;POSITIVE_TRAIN_SITE;QD=9.40;ReadPosRankSum=0.283;SOR=0.859;VQSLOD=8.99;culprit=MQRankSum GT:AD:DP:GQ:PL:PP 0/1:27,4:31:28:38,0,909:28,0,925 0/0:29,0:29:94:0,84,1260:0,94,1286 0/0:15,0:15:55:0,45,563:0,55,589 0/0:19,0:19:67:0,57,694:0,67,7200/0:10,0:10:40:0,30,302:0,40,328

I ran the following command:
/usr/bin/java -jar gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar VariantFiltration --variant APOE.recode.genotypeRefined.vcf --genotype-filter-expression "GQ < 20" --genotype-filter-name "lowGQ" --output APOE.recode.genotypeRefined.filteredoutLowGQ.vcf

The output for the above two variants is shown below:

chr19 44907654 rs769451 T G 1131.52 PASS AC=3;AF=0.016;AN=190;BaseQRankSum=-4.920e-01;DB;DP=2959;ExcessHet=3.0798;FS=0.788;InbreedingCoeff=-0.0160;MLEAC=3;MLEAF=0.016;MQ=59.93;MQRankSum=0.00;PG=0,16,39;POSITIVE_TRAIN_SITE;QD=12.04;ReadPosRankSum=0.00;SOR=0.582;VQSLOD=9.43;culprit=MQRankSum GT:AD:DP:GQ:PL:PP 0/0:37,0:37:99:0,99,1485:0,115,1524 0/0:35,0:35:99:0,99,1485:0,115,1524 0/0:27,0:27:85:0,69,1035:0,85,1074 0/0:37,0:37:99:0,99,1485:0,115,1524 0/0:25,0:25:82:0,66,990:0,82,1029

chr19 44908684 rs429358 T C 4672.69 PASS AC=24;AF=0.126;AN=190;BaseQRankSum=-2.469e+00;DB;DP=1958;ExcessHet=4.7065;FS=1.755;InbreedingCoeff=-0.0544;MLEAC=24;MLEAF=0.126;MQ=59.98;MQRankSum=0.00;PG=0,10,26;POSITIVE_TRAIN_SITE;QD=9.40;ReadPosRankSum=0.283;SOR=0.859;VQSLOD=8.99;culprit=MQRankSum GT:AD:DP:FT:GQ:PL:PP 0/1:27,4:31:PASS:28:38,0,909:28,0,925 0/0:29,0:29:PASS:94:0,84,1260:0,94,1286 0/0:15,0:15:PASS:55:0,45,563:0,55,589 0/0:19,0:19:PASS:67:0,57,694:0,67,720 0/0:10,0:10:PASS:40:0,30,302:0,40,328

See, the variant rs429358 has Genotype-level filter, FT tag for all the samples, while the variant rs769451 has no FT tag which means the genotype filter was not applied to this variant.

Is there something wrong/missing in the above command that could be the reason for missing FT tag? Or, this is expected behavior which I am not aware of ? I thought Genotype-level filter will be applied to all the input variants.
Please help.
Thanks
Srikant

Best Answer

Answers

  • srikant_vermasrikant_verma IndiaMember

    Thanks @bhanuGandham !
    You are right! I have checked, and found that the GQ > 20 for all samples for the first variant (rs769451), while there are 2 samples with GQ < 20 for the second variant (rs429358). So it is clear that the FT tag will be applied only when there will be at least a sample which does not pass the filtering criteria. In case all samples pass the filter, FT tag will not be applied.
    I am not sure if this is documented. If not, I would request you to document this so that all users are aware of this.
    Thanks again!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Will do and thank you @srikant_verma

Sign In or Register to comment.