We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GATK4 VariantFiltration is unable to tag the variants properly with --genotype-filter-expression

I am trying to filter variants based on FORMAT annotation: GQ < 20.

A couple of variants for only 5/95 samples from the input vcf (APOE.recode.genotypeRefined.vcf) are shown below:

chr19 44907654 rs769451 T G 1131.52 PASS AC=3;AF=0.016;AN=190;BaseQRankSum=-4.920e-01;DB;DP=2959;ExcessHet=3.0798;FS=0.788;InbreedingCoeff=-0.0160;MLEAC=3;MLEAF=0.016;MQ=59.93;MQRankSum=0.00;PG=0,16,39;POSITIVE_TRAIN_SITE;QD=12.04;ReadPosRankSum=0.00;SOR=0.582;VQSLOD=9.43;culprit=MQRankSum GT:AD:DP:GQ:PL:PP 0/0:37,0:37:99:0,99,1485:0,115,1524 0/0:35,0:35:99:0,99,1485:0,115,1524 0/0:27,0:27:85:0,69,1035:0,85,1074 0/0:37,0:37:99:0,99,1485:0,115,1524 0/0:25,0:25:82:0,66,990:0,82,1029

chr19 44908684 rs429358 T C 4672.69 PASS AC=24;AF=0.126;AN=190;BaseQRankSum=-2.469e+00;DB;DP=1958;ExcessHet=4.7065;FS=1.755;InbreedingCoeff=-0.0544;MLEAC=24;MLEAF=0.126;MQ=59.98;MQRankSum=0.00;PG=0,10,26;POSITIVE_TRAIN_SITE;QD=9.40;ReadPosRankSum=0.283;SOR=0.859;VQSLOD=8.99;culprit=MQRankSum GT:AD:DP:GQ:PL:PP 0/1:27,4:31:28:38,0,909:28,0,925 0/0:29,0:29:94:0,84,1260:0,94,1286 0/0:15,0:15:55:0,45,563:0,55,589 0/0:19,0:19:67:0,57,694:0,67,7200/0:10,0:10:40:0,30,302:0,40,328

I ran the following command:
/usr/bin/java -jar gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar VariantFiltration --variant APOE.recode.genotypeRefined.vcf --genotype-filter-expression "GQ < 20" --genotype-filter-name "lowGQ" --output APOE.recode.genotypeRefined.filteredoutLowGQ.vcf

The output for the above two variants is shown below:

chr19 44907654 rs769451 T G 1131.52 PASS AC=3;AF=0.016;AN=190;BaseQRankSum=-4.920e-01;DB;DP=2959;ExcessHet=3.0798;FS=0.788;InbreedingCoeff=-0.0160;MLEAC=3;MLEAF=0.016;MQ=59.93;MQRankSum=0.00;PG=0,16,39;POSITIVE_TRAIN_SITE;QD=12.04;ReadPosRankSum=0.00;SOR=0.582;VQSLOD=9.43;culprit=MQRankSum GT:AD:DP:GQ:PL:PP 0/0:37,0:37:99:0,99,1485:0,115,1524 0/0:35,0:35:99:0,99,1485:0,115,1524 0/0:27,0:27:85:0,69,1035:0,85,1074 0/0:37,0:37:99:0,99,1485:0,115,1524 0/0:25,0:25:82:0,66,990:0,82,1029

chr19 44908684 rs429358 T C 4672.69 PASS AC=24;AF=0.126;AN=190;BaseQRankSum=-2.469e+00;DB;DP=1958;ExcessHet=4.7065;FS=1.755;InbreedingCoeff=-0.0544;MLEAC=24;MLEAF=0.126;MQ=59.98;MQRankSum=0.00;PG=0,10,26;POSITIVE_TRAIN_SITE;QD=9.40;ReadPosRankSum=0.283;SOR=0.859;VQSLOD=8.99;culprit=MQRankSum GT:AD:DP:FT:GQ:PL:PP 0/1:27,4:31:PASS:28:38,0,909:28,0,925 0/0:29,0:29:PASS:94:0,84,1260:0,94,1286 0/0:15,0:15:PASS:55:0,45,563:0,55,589 0/0:19,0:19:PASS:67:0,57,694:0,67,720 0/0:10,0:10:PASS:40:0,30,302:0,40,328

See, the variant rs429358 has Genotype-level filter, FT tag for all the samples, while the variant rs769451 has no FT tag which means the genotype filter was not applied to this variant.

Is there something wrong/missing in the above command that could be the reason for missing FT tag? Or, this is expected behavior which I am not aware of ? I thought Genotype-level filter will be applied to all the input variants.
Please help.
Thanks
Srikant

Best Answer

Answers

  • srikant_vermasrikant_verma IndiaMember

    Thanks @bhanuGandham !
    You are right! I have checked, and found that the GQ > 20 for all samples for the first variant (rs769451), while there are 2 samples with GQ < 20 for the second variant (rs429358). So it is clear that the FT tag will be applied only when there will be at least a sample which does not pass the filtering criteria. In case all samples pass the filter, FT tag will not be applied.
    I am not sure if this is documented. If not, I would request you to document this so that all users are aware of this.
    Thanks again!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Will do and thank you @srikant_verma

Sign In or Register to comment.