Attention:
The front line support team will be unavailable to answer questions until May 27th 2019 as we are celebrating Memorial Day. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

No filtering occuring with the VariantFiltration walker

agopalagopal CanadaMember

Hello,

I have a file called rawSnps.vcf, which I've been trying to filter through the VariantFiltration Walker. The command I've been using is as follows:

/usr/java/latest/bin/java -Xmx2g -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V Papi_C02_1/rawSnps.vcf --filterExpression "QD < 2 || FS > 60 || MQ < 40 || MappingQualityRankSum < -12.5 || ReadPosRankSum < -8" --filterName "snpfilter" -o Papi_C02_1/filtered_snps.vcf -allowPotentiallyMisencodedQuals

I noticed that this wasn't working, so I switched to a more simple command, like this:

/usr/java/latest/bin/java -Xmx2g -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V rawSnps.vcf --filterExpression "MQ < 30" --filterName "snpfilter" -o filtered_snps_test.vcf -allowPotentiallyMisencodedQuals

However, that isn't working either.

The output I get from GATK looks like this:

/usr/java/latest/bin/java -Xmx2g -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V rawSnps.vcf --filterExpression "MQ < 30" --filterName "snpfilter" -o filtered_snps_test.vcf -allowPotentiallyMisencodedQuals
INFO 10:03:14,434 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:03:14,436 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22
INFO 10:03:14,437 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:03:14,437 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 10:03:14,442 HelpFormatter - Program Args: -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V rawSnps.vcf --filterExpression MQ < 30 --filterName snpfilter -o filtered_snps_test.vcf -allowPotentiallyMisencodedQuals
INFO 10:03:14,445 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.11.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13.
INFO 10:03:14,445 HelpFormatter - Date/Time: 2015/05/04 10:03:14
INFO 10:03:14,445 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:03:14,445 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:03:15,216 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:03:15,308 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 10:03:15,439 GenomeAnalysisEngine - Preparing for traversal
INFO 10:03:15,440 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:03:15,440 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:03:15,441 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:03:15,441 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 10:03:17,682 ProgressMeter - done 7942.0 2.0 s 4.7 m 99.9% 2.0 s 0.0 s
INFO 10:03:17,682 ProgressMeter - Total runtime 2.24 secs, 0.04 min, 0.00 hours
INFO 10:03:18,554 GATKRunReport - Uploaded run statistics report to AWS S3

>

I.e., you there is no error report. However, when I manually look at filtered_snps_test.vcf, I can see that many SNPs that have a MQ value below 30 (or 40, if that's what I've used) have still been passed in the filter.

I've also done a raw line-count of rawSnps.vcf and filtered_snps_test.vcf, and starting at the #CHROM position, both line counts are the same.

I'm not sure what's going on here. You can see both files here:

rawSnps.vcf https://www.dropbox.com/s/868kkpna5n315r0/filtered_snps_test.vcf?dl=0
filtered_snps_test.vcf https://www.dropbox.com/s/868kkpna5n315r0/filtered_snps_test.vcf?dl=0

Any help would be much appreciated.

Thanks!

Answers

  • agopalagopal CanadaMember

    Never mind, I figured this out.

    (1) Not having decimals (i.e., MQ > 40.0) makes VariantFiltration fail
    (2) I should be using SelectVariants, since FilterVariants doesn't actually allow me to separate PASS from FAIL.

    @Staff, the (1) is a really annoying idiosyncrasy that would be very useful if fixed. Is there a reason why non-decimal numbers aren't treated like numbers?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @agopal
    Hi,

    1) I understand this is annoying, but MQ is specified as a float in the vcf. You can tell this by looking at the vcf header where it specified what type MQ is.
    2) SelectVariants will only output the variants which pass you criteria. FilterVariants puts PASS in the Filter field for the variants that passed your criteria. The variants that did not pass your criteria will be labeled with the filterName that they failed.

    -Sheila

Sign In or Register to comment.