Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Weird, disordered tranche plots generated by VariantRecalibrator
I have run GATK4 VariantRecalibrator on a VCF file from C. elegans data:
GATK VariantRecalibrator -R c_elegans.PRJNA13758.WS263.genomic.fa -V GGVCF.vcf --resource cendr,known=false,training=true,truth=true,prior=15.0:all.vcf.gz -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP --output output.recal --tranches-file output.tranche --rscript-file output.plots.R
GGVCF.vcf was output by
GATK GenotypeGVCFs in a previous step. all.vcf.gz is a set of short variants that appear in natural isolates of C. elegans.
The plots attached are not ordered by % truth. Also, the bar with 90% truth should be the one with solid boxes, not having cumulative TPs or FPs. Also, these plots how I've got more FPs than TPs. However, my data is deeply sequenced (> 100X) and 95% variants have DP > 10 and QUAL > 30. Can I trust these truth results? The total number of variants in GGVCF.vcf is 22,000. all.vcf.gz contain 2,427,507 variants.