Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Weird, disordered tranche plots generated by VariantRecalibrator

cr517cr517 CambridgeMember

Hi,

I have run GATK4 VariantRecalibrator on a VCF file from C. elegans data:

GATK VariantRecalibrator -R c_elegans.PRJNA13758.WS263.genomic.fa -V GGVCF.vcf --resource cendr,known=false,training=true,truth=true,prior=15.0:all.vcf.gz -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP --output output.recal --tranches-file output.tranche --rscript-file output.plots.R

GGVCF.vcf was output by GATK GenotypeGVCFs in a previous step. all.vcf.gz is a set of short variants that appear in natural isolates of C. elegans.

The plots attached are not ordered by % truth. Also, the bar with 90% truth should be the one with solid boxes, not having cumulative TPs or FPs. Also, these plots how I've got more FPs than TPs. However, my data is deeply sequenced (> 100X) and 95% variants have DP > 10 and QUAL > 30. Can I trust these truth results? The total number of variants in GGVCF.vcf is 22,000. all.vcf.gz contain 2,427,507 variants.

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @cr517
    Hi,

    VQSR is probably not doing the best job, as it needs many resource files. You can try using the new CNN workflow which is better for small datasets and non-model organisms. Read more about it here.

    -Sheila

Sign In or Register to comment.