We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
Strange VQSR plots after version 3.5 release
I ran GATK's cohort genotyping pipeline on 5000 human samples with Illumina WGS ~1.3x data, up through GenotypeGVCFs (and CatVariants to combine chunks) using v3.4-46. Next I ran VariantRecalibrator (initially just chr1) using recommended settings with both v3.4-46 and v3.5. Here is my command for both versions:
java -Xmx40g \
-jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R hs37m.fa \
-input gatk.hc.combined.genotyped.chr1-22.vcf.gz \
-recalFile snps.recal \
-tranchesFile snps.tranches \
-rscriptFile recalibrate_SNP_plots.R \
--target_titv 2.15 \
-nt 24 \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf.gz \
-resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.vcf.gz \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.b37.vcf.gz \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.b37.vcf.gz \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -an InbreedingCoeff \
-mode SNP \
-L 1 \
-tranche 100.0 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 98.5 -tranche 90.0 \
--maxGaussians 6 \
The attached tranches plots (snps.tranches.v3.5.pdf) generated w/ v3.5 look strange because:
1) The tranches are out of order on the bar plot (e.g., 99.5 is before 99)
2) The fill coloring doesn't make sense for tranches 99 and 98.5 - there are orange stripes over the blue bar
3) The scatter plot's connecting lines go in both directions
The plots for v3.4-46 look more normal (snps.tranches.v3.4-46.pdf), though I'm still trying to figure out how to get closer to the expected 2.15 Ti/Tv ratio. Oddly, the Ti/Tv ratios differ slightly between v3.4-46 and v3.5 even though the same data and settings were used.
I suspected the behavior w/ v3.5 may be a possible bug in VariantRecalibrator, which is why I'm posting here. Please let me know if you need any more information.