If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

No filtering occuring with the VariantFiltration walker

agopalagopal CanadaMember


I have a file called rawSnps.vcf, which I've been trying to filter through the VariantFiltration Walker. The command I've been using is as follows:

/usr/java/latest/bin/java -Xmx2g -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V Papi_C02_1/rawSnps.vcf --filterExpression "QD < 2 || FS > 60 || MQ < 40 || MappingQualityRankSum < -12.5 || ReadPosRankSum < -8" --filterName "snpfilter" -o Papi_C02_1/filtered_snps.vcf -allowPotentiallyMisencodedQuals

I noticed that this wasn't working, so I switched to a more simple command, like this:

/usr/java/latest/bin/java -Xmx2g -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V rawSnps.vcf --filterExpression "MQ < 30" --filterName "snpfilter" -o filtered_snps_test.vcf -allowPotentiallyMisencodedQuals

However, that isn't working either.

The output I get from GATK looks like this:

/usr/java/latest/bin/java -Xmx2g -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V rawSnps.vcf --filterExpression "MQ < 30" --filterName "snpfilter" -o filtered_snps_test.vcf -allowPotentiallyMisencodedQuals
INFO 10:03:14,434 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:03:14,436 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22
INFO 10:03:14,437 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:03:14,437 HelpFormatter - For support and documentation go to
INFO 10:03:14,442 HelpFormatter - Program Args: -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V rawSnps.vcf --filterExpression MQ < 30 --filterName snpfilter -o filtered_snps_test.vcf -allowPotentiallyMisencodedQuals
INFO 10:03:14,445 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.11.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13.
INFO 10:03:14,445 HelpFormatter - Date/Time: 2015/05/04 10:03:14
INFO 10:03:14,445 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:03:14,445 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:03:15,216 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:03:15,308 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 10:03:15,439 GenomeAnalysisEngine - Preparing for traversal
INFO 10:03:15,440 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:03:15,441 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:03:15,441 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 10:03:17,682 ProgressMeter - done 7942.0 2.0 s 4.7 m 99.9% 2.0 s 0.0 s
INFO 10:03:17,682 ProgressMeter - Total runtime 2.24 secs, 0.04 min, 0.00 hours
INFO 10:03:18,554 GATKRunReport - Uploaded run statistics report to AWS S3


I.e., you there is no error report. However, when I manually look at filtered_snps_test.vcf, I can see that many SNPs that have a MQ value below 30 (or 40, if that's what I've used) have still been passed in the filter.

I've also done a raw line-count of rawSnps.vcf and filtered_snps_test.vcf, and starting at the #CHROM position, both line counts are the same.

I'm not sure what's going on here. You can see both files here:


Any help would be much appreciated.



  • agopalagopal CanadaMember

    Never mind, I figured this out.

    (1) Not having decimals (i.e., MQ > 40.0) makes VariantFiltration fail
    (2) I should be using SelectVariants, since FilterVariants doesn't actually allow me to separate PASS from FAIL.

    @Staff, the (1) is a really annoying idiosyncrasy that would be very useful if fixed. Is there a reason why non-decimal numbers aren't treated like numbers?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    1) I understand this is annoying, but MQ is specified as a float in the vcf. You can tell this by looking at the vcf header where it specified what type MQ is.
    2) SelectVariants will only output the variants which pass you criteria. FilterVariants puts PASS in the Filter field for the variants that passed your criteria. The variants that did not pass your criteria will be labeled with the filterName that they failed.


Sign In or Register to comment.