If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Using Variant Filtration properly
So I have tried using the Variant Recalibrator with my horse data, but it seems I don't have enough known SNPs for it to work properly, so I am going to go the Variant Filtration route. First I used vcftools to filter out the indels so I am left with only SNPs. Then, I ran Variant Filtration on my data using this command:
java -Xmx2g -jar /share/apps/gatk/GenomeAnalysisTK.jar -T VariantFiltration -R /data/horse/reference/eqcab2/eqCab2.all_chr.fa --variant horse.output.raw.snps_only.vcf.recode.vcf -o horse.output.filtered.snps_only.vcf --filterExpression "QD < 2.0 || MQ < 40.0 || FS > 60.0 || HaplotypeScore > 13.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "horseSNPFilters"
After which I get this error for every line:
WARN 16:56:11,946 Interpreter - ![38,52]: 'QD < 2.0 || MQ < 40.0 || FS > 60.0 || HaplotypeScore > 13.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0;' undefined variable HaplotypeScore
So I tried various things like adding the HaplotypeScore annotation, and that didn't work. I then tried taking out that part of the filter, and I got this error for every line:
WARN 16:56:30,115 Interpreter - ![38,47]: 'QD < 2.0 || MQ < 40.0 || FS > 60.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0;' undefined variable MQRankSum
... which is very odd since the file clearly has a MQRankSum value in the INFO field for every line. I got the same error with ReadPosRankSum as well. When I was finally down to "QD < 2.0 || MQ < 40.0 || FS > 60.0", it ran without error, but it didn't seem like it filtered anything. So then I tried just using "QD < 2.0", which did nothing... I tried "QD < 50" which did nothing despite there clearly being SNPs with QD < 50. So my question is, what am I doing wrong and why isn't any filtering happening? Also, how do I add HaplotypeScore annotation to every line so that I can filter on it? Finally, this is a WGS experiment so it would also be nice to get some advice about the filtration parameters, because it seems like that according to the website, the above filter values are for exome sequencing, which my data is not. Any help is highly appreciated! Thanks!