We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Poor VQSR filtering

rmfrmf Member
edited January 2018 in Ask the GATK team

I ran VQSR on my vcf file from joint genotyping. I used dnSNP as training. The plots generated during VQSR don't seem to separate the pos and neg very well. Below are the plots for one sample.

I use --ts_filter_level 99.0 during recalibration. And this is an example of the applied score for example;

##FILTER=<ID=VQSRTrancheSNP99.90to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -39616.7976">
##FILTER=<ID=VQSRTrancheSNP99.90to100.00,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -39616.7976 <= x < -6.9367">

Of the 26 million SNPs, only 32,000 are filtered out by VQSR, so I am not sure if this is working.
I was wondering what would be the expert opinion looking at these plots. Are the VQSLOD scores usable?

To get an idea of the distribution of VQSLOD values, I plotted a histogram of around 10,000 scores sampled from the first 1 million variants in the vcf file. Shown for SNPs and INDELs separately.



It looks like there are three peaks. Any ideas on that? Could that be used for filtering?

Also, I am working on Zebrafish and not Human.


Sign In or Register to comment.