Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator - "Unable to retrieve result"

nikmalnikmal Member
edited August 2013 in Ask the GATK team

Hello GATK team!

I've encountered a strange problem when running VariantRecalibrator on my raw call set from HaplotypeCaller, namely the one on the topic. The input BAM files to HaplotypeCaller are three whole exomes (~60x) which have gone through the best practices pipeline with duplicate marking, indel realignment and base recalibration.

This is the command I used to create the call set, and I'm using GATK 2.6-5:

java -Xmx18g -jar GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R GRCh37-lite.fa \
-I Input_BAMs.list \
-L Exon_Regions_b37_130423.interval_list \
-o Variants.HC.raw.vcf \
--dbsnp dbsnp_137.b37.vcf \
-stand_call_conf 30.0 \
-stand_emit_conf 10.0 \
-dcov 200 \
--validation_strictness LENIENT \
-l INFO \
-nct 6

The above command worked like a charm without any problems. However, I run into problems in the next step when trying to recalibrate the variants. Command:

java -Xmx24g -jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R GRCh37-lite.fa \
-L Exon_Regions_b37_130423.interval_list \
-input Variants.HC.raw.vcf \
-recalFile Variants_EX.hc.snp.recal \
-tranchesFile Variants_.hc.snp.tranches \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf \
-resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_highconfsnps_b37.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_137.b37.vcf \
-an DP \
-an QD \
-an FS \
-an MQRankSum \
-an ReadPosRankSum \
-mode SNP \
-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
-percentBad 0.01 \
-minNumBad 1000 \
--validation_strictness LENIENT \
-l INFO \
-nt 8

I get the following error message:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Unable to retrieve result
        at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
Caused by: java.lang.IllegalArgumentException: log10p: Values must be non-infinite and non-NAN
        at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:236)
        at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:224)
        at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:249)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.GaussianMixtureModel.evaluateDatum(GaussianMixtureModel.java:238)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibratorEngine.evaluateDatum(VariantRecalibratorEngine.java:167)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibratorEngine.evaluateData(VariantRecalibratorEngine.java:100)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:343)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:132)
        at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.notifyTraversalDone(HierarchicalMicroScheduler.java:226)
        at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:183)
        ... 5 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.6-5-gba531bd):
##### ERROR
##### ERROR Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Unable to retrieve result
##### ERROR ------------------------------------------------------------------------------------------

Is there any setting I can try to adjust to solve this problem, or what can I do?

Let me know if you need any more info or logs.

EDIT: Found some info at http://gatkforums.broadinstitute.org/discussion/1259/what-vqsr-training-sets-arguments-should-i-use-for-my-specific-project and I'm now going to try: -maxGaussians 4 and -percentBad 0.05. I'll post an update when I've got some results!

EDIT 2: Everything worked as a charm with the new parameters! Hopefully this can help someone else with the same problem!

Post edited by nikmal on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks for reporting your solution!

    This looks like it was a "small callset" problem. I'm not sure why the program is crashing instead of giving you the small callset error message, will look into it. In any case, I'm glad your problem is solved.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    I have the same problem with version 3.3 (and 3.2) in SNP mode and changing --maxGaussians from the default 8 to 4 did not solve the problem. What solved it for me was changing -nt 8 to -nt 1. I found this suggestion in this thread:

    http://gatkforums.broadinstitute.org/discussion/4796/error-message-unable-to-retrieve-result
    

    I did however run into another problem after that:

    http://gatkforums.broadinstitute.org/discussion/4248/variantrecalibrator-removing-all-snps-from-the-training-set
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I am coming to the conclusion that multithreading, not money, is the source of (almost) all evil.

  • pdexheimerpdexheimer Member ✭✭✭✭

    You've obviously never taken a parallel computation class if you're just now arriving at that conclusion :)

    To sum up a very intense semester: every operation is a potential race condition, which will lead to untraceable bugs.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, indeed! Well, I once sat through one 1-hr talk on that topic and walked away with pretty much that conclusion, but thought I should keep an open mind. So much for that. If it was up to me, we would kill all multithreading in GATK ;)

Sign In or Register to comment.