If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Using Variant Filtration - RETIRED

delangeldelangel Broad InstituteMember ✭✭
edited May 2015 in Archive

This tool is not deprecated, but this article is now superseded by the method article on JEXL expressions.


For a complete, detailed argument reference, refer to the GATK document page here.

The documentation for Using JEXL expressions within the GATK contains very important information about limitations of the filtering that can be done; in particular please note the section on working with complex expressions.

Filtering Individual Genotypes

One can now filter individual samples/genotypes in a VCF based on information from the FORMAT field: Variant Filtration will add the sample-level FT tag to the FORMAT field of filtered samples (this does not affect the record's FILTER tag). This is still a work in progress and isn't quite as flexible and powerful yet as we'd like it to be. For now, one can filter based on most fields as normal (e.g. GQ < 5.0), but the GT (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets (isHet == 1), refs (isHomRef == 1), or homs (isHomVar == 1).

Post edited by Geraldine_VdAuwera on


  • isp2isp2 Member

    I'm having some problems marking heterozygous genotypes. I used the following command:

        java -Xmx12g -jar ${GATKPATH}/GenomeAnalysisTK.jar \
        -R reference.fa \
        -T VariantFiltration \
        -o markHET.vcf \
        --variant original.vcf \
        --genotypeFilterExpression "isHET == 1" \
        --genotypeFilterName "HET"

    But when I grep on "HET" I don't find anything. Looking at the vcf generated (markHET.vcf), for example the entry between '**' should have been marked:

        Chr1    1348    .       G       A       425.43  PASS    AC=3;AF=0.026;AN=116;BaseQRankSum=-0.841;DP=737;Dels=0.00;FS=3.899;HaplotypeScore=0.6335;InbreedingCoeff=0.5494;MLEAC=3;MLEAF=0.026;MQ=49.39;MQ0=0;MQRankSum=0.953;QD=23.64;ReadPosRankSum=-0.942;SB=-9.202e+01    GT:AD:DP:GQ:PL  0/0:11,0:11:33:0,33,414 0/0:8,0:8:15:0,15,183   0/0:9,0:9:27:0,27,326   0/0:18,0:18:51:0,51,640 0/0:7,0:7:21:0,21,264   0/0:13,0:13:39:0,39,479 0/0:14,0:15:42:0,42,527 0/0:18,1:19:23:0,23,522 **0/1:3,1:5:19:19,0,73**    0/0:19,0:19:57:0,57,678 [etc.]

    Is the command I'm using wrong?

    Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Sorry for the response delay, I had to check since this is a tricky topic (I hate JEXL, it's so finicky). Our documentation makes the genotype evaluation syntax sound too simple; it's actually a little more complicated than that (I will put fixing the doc on my to-do list). There are two options; either you access individual sample genotypes using 'vc.getGenotype("sampleName").isHet()', or you can filter over all sample genotypes using something like 'GT == 0/1'. I hope this helps.

  • flescaiflescai Member ✭✭

    Hi there,
    I have a problem when using VariantFiltration: it seems not to change the FORMAT field for all variants, which is obviously a big issue when subsequently using the VCF file with any program for the analysis..

    I'm applying a filter to the genotype field as follows:

    java -Xmx8g -Xms6g -jar /com/extra/GATK/2.7-2/jar-bin/GenomeAnalysisTK.jar \
    -T VariantFiltration \
    -R /data/refseq/tooldata/GATKbundle/2.5/b37/human_g1k_v37.fasta \
    -o UnifiedGenotyper_filteredGenotype_DP20GQ30.vcf \
    --variant UnifiedGenotyper.recalibrated.vcf \
    --genotypeFilterExpression "DP <20 && GQ <30" \
    --genotypeFilterName "FILTQDP"

    but in the output VCF file I have lines like this for some variants:

    1 762273 rs3115849 G A 87315.72 PASS AC=47;AF=0.783;AN=60;BaseQRankSum=4.447;DB;DP=4287;Dels=0.00;FS=33.490;HaplotypeScore=4.6255;InbreedingCoeff= -0.2766;MLEAC=47;MLEAF=0.783;MQ=44.06;MQ0=214;MQRankSum=-25.916;QD=20.37;ReadPosRankSum=-1.274;VQSLOD=-2.860e+00;culprit=MQRankSum;set=snps GT:AD:DP:GQ:PL 1/1:0,150:150:99:4365 ,361,0 0/1:133,117:238:99:2646,0,3533 1/1:0,250:250:99:7128,587,0 1/1:1,249:243:99:6812,587,0 1/1:0,58:58:99:1806,150,0 0/1:143,106:237:99:2412,0,3790 0/1:129,121:2 44:99:2867,0,3411 1/1:1,248:242:99:6890,593,0 1/1:0,247:247:99:6998,575,0 1/1:0,125:125:99:3559,298,0 0/1:96,66:154:99:1465,0,2348 0/1:76,65:135:99:1391,0,20630/1:73,82:148:99:1631,0,1886 1/1:0,148:148:99:4319,364,0 1/1:1,185:185:99:5001,427,0 1/1:0,106:106:99:3221,274,0 0/1:57,42:95:99:1016,0,1572 1/1:1,121:118:99:3145 ,265,0 0/1:99,74:165:99:1702,0,2642 1/1:0,137:137:99:3993,343,0 0/1:54,36:86:99:909,0,1308 1/1:0,130:130:99:3784,322,0 0/1:39,46:81:99:893,0,1027 1/1:1,49:48:9 9:1399,120,0 0/1:32,40:70:99:955,0,877 1/1:0,75:74:99:2045,174,0 0/1:55,46:96:99:1076,0,1283 1/1:0,52:52:99:1457,126,0 0/1:19,31:48:99:580,0,504 1/1:0 ,67:67:99:1962,174,0

    and lines like this for others

    1 866319 rs9988021 G A 10799.89 PASS AC=60;AF=1.00;AN=60;DB;DP=355;Dels=0.00;FS=0.000;HaplotypeScore=0.4327;InbreedingCoeff=-0.0165;MLEAC=60;MLEAF =1.00;MQ=57.66;MQ0=0;QD=30.42;VQSLOD=7.16;culprit=FS;set=snps GT:AD:DP:FT:GQ:PL 1/1:0,7:7:FILTQDP:21:245,21,0 1/1:0,18:19:PASS:36:459,36,0 1/1:0,14:14:PASS:30:396,30,01/1:0,11:11:PASS:30:380,30,0 1/1:0,4:4:FILTQDP:12:145,12,0 1/1:0,7:7:FILTQDP:18:224,18,0 1/1:0,13:13:PASS:39:464,39,0 1/1:0,23:23:PASS:57:720,57,0 1/1:0,9:9:FILTQDP:27: 327,27,0 1/1:0,6:6:FILTQDP:15:182,15,0 1/1:0,4:4:FILTQDP:9:120,9,0 1/1:0,10:10:FILTQDP:21:268,21,0 1/1:0,15:15:PASS:33:414,33,0 1/1:0,2:2:FILTQDP:6:68,6,0 1/1:0 ,12:12:PASS:30:383,30,0 1/1:0,8:8:FILTQDP:15:198,15,0 1/1:0,9:9:FILTQDP:24:294,24,0 1/1:0,14:14:PASS:39:472,39,0 1/1:0,37:37:PASS:90:1102,90,0 1/1:0,19:19:PASS:48:603,48,01/1:0,28:28:PASS:69:865,69,0 1/1:0,38:38:PASS:96:1192,96,0 1/1:0,4:4:FILTQDP:12:133,12,0 1/1:0,5:5:FILTQDP:12:152,12,0 1/1:0,4:4:FILTQDP:6:77,6,0 1/1:0,5:5:FILTQDP:12: 130,12,0 1/1:0,12:12:FILTQDP:27:325,27,0 1/1:0,6:6:FILTQDP:15:169,15,0 1/1:0,5:5:FILTQDP:12:151,12,0 1/1:0,5:5:FILTQDP:15:165,15,0

    the first one has a FORMAT field GT:AD:DP:GQ:PL while the second one contains my filter GT:AD:DP:FT:GQ:PL.
    is there a specific reason why that happens?


  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    When all genotypes within a record pass filters, it looks like as an optimization is just leaves out the PASS from each one (since it is implicit).

  • flescaiflescai Member ✭✭

    Hi Eric @ebanks,
    thanks for your answer, very valuable.
    I believe this efficiency, however, makes the output VCF less "rigorous" (if I may say that), because the sample field is not represented in a unique way for all the variants.
    While knowing that now I can manage my grep/filtering, it might create issues with other downstream applications.
    Although it's not a high priority, would be great if the output could be uniform.
    thanks, Francesco

  • metheusemetheuse ann arborMember
    edited October 2014

    So isHET == 1 works now? I tried it and it correctly labeled the hSNP. vc.getGenotype("sampleName").isHet() doesn't work for me.

This discussion has been closed.