This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Using Variant Filtration properly
So I have tried using the Variant Recalibrator with my horse data, but it seems I don't have enough known SNPs for it to work properly, so I am going to go the Variant Filtration route. First I used vcftools to filter out the indels so I am left with only SNPs. Then, I ran Variant Filtration on my data using this command:
java -Xmx2g -jar /share/apps/gatk/GenomeAnalysisTK.jar -T VariantFiltration -R /data/horse/reference/eqcab2/eqCab2.all_chr.fa --variant horse.output.raw.snps_only.vcf.recode.vcf -o horse.output.filtered.snps_only.vcf --filterExpression "QD < 2.0 || MQ < 40.0 || FS > 60.0 || HaplotypeScore > 13.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "horseSNPFilters"
After which I get this error for every line:
WARN 16:56:11,946 Interpreter - ![38,52]: 'QD < 2.0 || MQ < 40.0 || FS > 60.0 || HaplotypeScore > 13.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0;' undefined variable HaplotypeScore
So I tried various things like adding the HaplotypeScore annotation, and that didn't work. I then tried taking out that part of the filter, and I got this error for every line:
WARN 16:56:30,115 Interpreter - ![38,47]: 'QD < 2.0 || MQ < 40.0 || FS > 60.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0;' undefined variable MQRankSum
... which is very odd since the file clearly has a MQRankSum value in the INFO field for every line. I got the same error with ReadPosRankSum as well. When I was finally down to "QD < 2.0 || MQ < 40.0 || FS > 60.0", it ran without error, but it didn't seem like it filtered anything. So then I tried just using "QD < 2.0", which did nothing... I tried "QD < 50" which did nothing despite there clearly being SNPs with QD < 50. So my question is, what am I doing wrong and why isn't any filtering happening? Also, how do I add HaplotypeScore annotation to every line so that I can filter on it? Finally, this is a WGS experiment so it would also be nice to get some advice about the filtration parameters, because it seems like that according to the website, the above filter values are for exome sequencing, which my data is not. Any help is highly appreciated! Thanks!