The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
Weird, disordered tranche plots generated by VariantRecalibrator
I have run GATK4 VariantRecalibrator on a VCF file from C. elegans data:
GATK VariantRecalibrator -R c_elegans.PRJNA13758.WS263.genomic.fa -V GGVCF.vcf --resource cendr,known=false,training=true,truth=true,prior=15.0:all.vcf.gz -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP --output output.recal --tranches-file output.tranche --rscript-file output.plots.R
GGVCF.vcf was output by
GATK GenotypeGVCFs in a previous step. all.vcf.gz is a set of short variants that appear in natural isolates of C. elegans.
The plots attached are not ordered by % truth. Also, the bar with 90% truth should be the one with solid boxes, not having cumulative TPs or FPs. Also, these plots how I've got more FPs than TPs. However, my data is deeply sequenced (> 100X) and 95% variants have DP > 10 and QUAL > 30. Can I trust these truth results? The total number of variants in GGVCF.vcf is 22,000. all.vcf.gz contain 2,427,507 variants.