Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
VariantFiltration bug with filtering QualByDepth
My version info:
/home/sn/software/java/bin/java -jar /home/sn/software/GATK/GenomeAnalysisTK.jar --version
I've being using GATK HC -- GenotypeGVCFs -- VariantAnnotator -- SelectVariants -- VariantFiltration to get var calling and annotation as the best practise suggested. At GenotypeGVCFs and VariantAnnotator I tried to keep all possible filters such QD, FS, etc., and at SelectVariants I separated the SNPs and InDels. At last using VariantFiltration to both, where strange happens.
/home/sn/software/java/bin/java -jar /home/sn/software/GATK/GenomeAnalysisTK.jar \ -T VariantFiltration \ -R /home/yangjy/16T/GEN0ME/test/db/GCF_000006825.1_ASM682v1_genomic.fna \ -V /home/yangjy/16T/GEN0ME/test/_tmp_combined_calling/raw_snp.vcf \ --filterName filter_snp \ --filterExpression "QD < 2.0" \ -jdk_deflater -jdk_inflater \ -o /home/yangjy/16T/GEN0ME/test/_tmp_combined_calling/filtered_snp.vcf \ 2>> /home/yangjy/16T/GEN0ME/test/_tmp_combined_calling/_log_call_30091258.txt
And I got warning like this:
WARN 09:13:43,288 Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
Yet when I check the output vcf file, I can still find correct filter been applied like this:
##FILTER=<ID=filter_snp,Description="QD < 2.0">
NC_002663.1 86042 . C T 417.87 filter_snp ABHom=0.667;AC=2;AF=0.667;AN=3;BaseQRankSum=-3.530e-01;ClippingRankSum=0.00;DP=469;FS=3.973;GC=44.55;GQ_MEAN=81.00;GQ_STDDEV=31.18;HRun=0;MLEAC=2;MLEAF=0.667;MQ=35.84;MQRankSum=-1.087e+01;NCC=0;OND=0.333;QD=1.34;ReadPosRankSum=-1.397e+00;SNPEFF_AMINO_ACID_CHANGE=Y3757;SNPEFF_CODON_CHANGE=taC/taT;SNPEFF_EFFECT=SYNONYMOUS_CODING;SNPEFF_EXON_ID=1;SNPEFF_FUNCTIONAL_CLASS=SILENT;SNPEFF_GENE_BIOTYPE=protein_coding;SNPEFF_GENE_NAME=PM_RS00295;SNPEFF_IMPACT=LOW;SNPEFF_TRANSCRIPT_ID=TRANSCRIPT_gene61;SOR=0.505;Samples=GX-PM-1,GX-PM-3;VariantType=SNP GT:AD:DP:GQ:PL 1:79,85:164:99:405,0 0:158,0:158:99:0,294 1:77,70:147:45:45,0
Notice that the FILTER field is "filter_snp" and the QD = 1.34. And it seems that for InDels the QD filter works correctly with no warnings. So I think it's a harmless bug.
I didn't check if this was reported or fixed, if so, please let me know and thanks for reading my post. Great useful tool set you guys been making. I've learned a lot.