If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

ERROR MESSAGE: Code exception while running VariantRecalibrator

my command line:
java -Xmx200g -jar ../GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -R re.fasta -T VariantRecalibrator -mode SNP -input Lib1-.raw.vcf -resource:dbsnp,known=true,training=true,truth=true,prior=6.0 Lib1-B2-realn-confidence.raw.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ --maxGaussians 4 -percentBad 0.05 -recalFile Lib1-dedup-realn-confidence.raw.vcf-SNP-recal -tranchesFile Lib1--realn-confidence.raw.vcf-SNP.tranches -rscriptFile Lib1-realn-confidence.raw.vcf-SNP.plots.R --TStranche 90.0

Error :
INFO 10:25:19,779 VariantDataManager - Found 0 variants overlapping bad sites training tracks.
WARN 10:25:19,791 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.
INFO 10:25:48,419 ProgressMeter - scaffold5:1428626 2.52e+03 30.0 s 3.3 h 97.3% 30.0 s 0.0 s

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.selectWorstVariants(
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(
at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(
at org.broadinstitute.sting.commandline.CommandLineProgram.start(
at org.broadinstitute.sting.commandline.CommandLineProgram.start(
at org.broadinstitute.sting.gatk.CommandLineGATK.main(

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):
ERROR Please check the documentation guide to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ---------------------------------------------------------------------------------------

because my data is not human's, I use my own litlle Training data to run , Does the Training data cause Errors ?
Are there any other errors?
many thanks .

Best Answer


  • wangl0807wangl0807 Member

    Thanks for you .I will try to do VariantFiltration. But when I run BaseRecalibrator , -knownSites ,must be added ,but I do not have little known site? How I do that?
    Thanks you again.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You can probably use the training data that you were trying to use with VQSR.

  • wangl0807wangl0807 Member

    Thanks to you ,
    Yes ,I run BaseRecalibrator using the litle data which I used in VQSR. but there is another problem, the output plot have ORIGINAL and RECALIBRATED, but the training data is not public, firstly ,It is that I built using GATK and samtool,select commons in them. the ORIGINAL in plot means known site,but it is not known, How I do that?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm sorry, I don't understand your question. Can you please clarify this part:

    the ORIGINAL in plot means known site,but it is not known, How I do that?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @wangl0807,

    Sorry to get back to this so late, but would it be possible for you to send us some test files to replicate the error locally? Normally when this happens the GATK should output an informative error message, so we'd like to debug why it's not behaving well here.

Sign In or Register to comment.