Poor VQSR filtering
I ran VQSR on my vcf file from joint genotyping. I used dnSNP as training. The plots generated during VQSR don't seem to separate the pos and neg very well. Below are the plots for one sample.
--ts_filter_level 99.0 during recalibration. And this is an example of the applied score for example;
##FILTER=<ID=VQSRTrancheSNP99.90to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -39616.7976"> ##FILTER=<ID=VQSRTrancheSNP99.90to100.00,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -39616.7976 <= x < -6.9367">
Of the 26 million SNPs, only 32,000 are filtered out by VQSR, so I am not sure if this is working.
I was wondering what would be the expert opinion looking at these plots. Are the VQSLOD scores usable?
To get an idea of the distribution of VQSLOD values, I plotted a histogram of around 10,000 scores sampled from the first 1 million variants in the vcf file. Shown for SNPs and INDELs separately.
It looks like there are three peaks. Any ideas on that? Could that be used for filtering?
Also, I am working on Zebrafish and not Human.