Powered by Vanilla. Made with Bootstrap.
Failed in VariantRecalibrator for my customized target re-sequencing data?

zhoujj2013zhoujj2013 HongKongMember Posts: 10

I use GATK to process a dataset produce by customized target re-sequencing, but I failed in VariantRecalibrator step.

the run log:
INFO 17:17:55,949 ArgumentTypeDescriptor - Dynamically determined type of ./ to be BED
INFO 17:17:56,001 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:17:56,001 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.8-1-g932cd3a, Compiled 2013/12/06 16:47:15
INFO 17:17:56,001 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 17:17:56,001 HelpFormatter - For support and documentation go to
INFO 17:17:56,007 HelpFormatter - Program Args: -T VariantRecalibrator -R ./hg19.fa -input ./output.raw.snps.indels.vcf -L./ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_138.hg19.vcf -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -mode SNP -recalFile output.snp.recal -tranchesFile output.snp.tranches -rscriptFile output.snp.plots.R
INFO 17:17:56,007 HelpFormatter - Date/Time: 2014/02/18 17:17:56
INFO 17:17:56,007 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:17:56,008 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:17:56,024 ArgumentTypeDescriptor - Dynamically determined type of ./output.raw.snps.indels.vcf to be VCF
INFO 17:17:56,030 ArgumentTypeDescriptor - Dynamically determined type of hapmap_3.3.hg19.vcf to be VCF
INFO 17:17:56,033 ArgumentTypeDescriptor - Dynamically determined type of dbsnp_138.hg19.vcf to be VCF
INFO 17:17:56,619 GenomeAnalysisEngine - Strictness is SILENT
INFO 17:17:56,684 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 17:17:56,703 RMDTrackBuilder - Loading Tribble index from disk for file ./output.raw.snps.indels.vcf
INFO 17:17:56,728 RMDTrackBuilder - Loading Tribble index from disk for file hapmap_3.3.hg19.vcf
INFO 17:17:56,764 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_138.hg19.vcf
INFO 17:17:56,935 IntervalUtils - Processing 801109 bp from intervals
INFO 17:17:57,012 GenomeAnalysisEngine - Preparing for traversal
INFO 17:17:57,017 GenomeAnalysisEngine - Done preparing for traversal
INFO 17:17:57,017 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 17:17:57,024 TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
INFO 17:17:57,024 TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q6.0
INFO 17:17:59,530 VariantDataManager - QD: mean = 21.47 standard deviation = 9.62
INFO 17:17:59,530 VariantDataManager - MQRankSum: mean = -0.02 standard deviation = 1.36
INFO 17:17:59,531 VariantDataManager - ReadPosRankSum: mean = 0.50 standard deviation = 1.05
INFO 17:17:59,531 VariantDataManager - FS: mean = 2.59 standard deviation = 7.85
INFO 17:17:59,532 VariantDataManager - MQ: mean = 41.74 standard deviation = 1.10
INFO 17:17:59,538 VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, FS, ReadPosRankSum, MQRankSum]
INFO 17:17:59,539 VariantDataManager - Training with 402 variants after standard deviation thresholding.
WARN 17:17:59,539 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.
INFO 17:17:59,542 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 17:17:59,658 VariantRecalibratorEngine - Finished iteration 0.
INFO 17:17:59,715 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.20178
INFO 17:17:59,726 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.05225
INFO 17:17:59,736 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.05773
INFO 17:17:59,746 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.03079
INFO 17:17:59,755 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.05754
INFO 17:17:59,765 VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.09399
INFO 17:17:59,774 VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.19735
INFO 17:17:59,784 VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.00727
INFO 17:17:59,793 VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.00804
INFO 17:17:59,803 VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.00220
INFO 17:17:59,805 VariantRecalibratorEngine - Convergence after 51 iterations!
INFO 17:17:59,815 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(
at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(
at org.broadinstitute.sting.commandline.CommandLineProgram.start(
at org.broadinstitute.sting.commandline.CommandLineProgram.start(
at org.broadinstitute.sting.gatk.CommandLineGATK.main(

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------

Is GATK suitable for target re-sequencing data analysis(snp calling && indel calling)?

Could you please help me out?


  • zhoujj2013zhoujj2013 HongKongMember Posts: 10

    Hi Geraldine,

    You mean that GATK can be used to perform targeted re-sequencing data analysis, but not include VQSR.

    Is there any other process I need to take a note when I using GATK package to deal with targeted re-sequencing data?

    For SNP and Indel, I simply need to change the filter step to VariantFiltration in GATK.

    Am I right?


