To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

VQSR - GATK runtime error

TechnicalVaultTechnicalVault Cambridge, UKMember
edited May 2015 in Ask the GATK team

I'm currently doing a comparison between 100 greek samples downsampled to 30x and 15x to explore the effects this has on our various tools. I'm currently only evaluating chromosome 6 as I need the initial comparison results soon and something went boom. Curiously enough it only affects the 15x version of the data and not the 30x. I suspect it might be something threading related? I'm going to retry with less and no threads. Confirmed same error with 31 threads, now testing in single threaded mode.

INFO  17:29:10,799 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
INFO  17:29:10,800 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  17:29:10,800 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  17:29:10,806 HelpFormatter - Program Args: -T VariantRecalibrator -nt 32 -R /lustre/scratch113/resources/ref/Homo_sapiens/1000Genomes_hs37d5/hs37d5.fa -input greek_bams/15x/15x_annot.vcf.gz --recal_
file greek_bams/15x_vqsr_snp_recal.vcf.gz --tranches_file greek_bams/15x_vqsr_snp_recal.tranches -mode SNP -rscriptFile greek_bams/15x.snp.plot -L 6 -l INFO -resource:hapmap,known=false,training=true,trut
h=true,prior=15.0 /lustre/scratch111/resources/variation/Homo_sapiens/grch37/gatk-bundle/2.5/hapmap_3.3.b37.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /lustre/scratch111/resources/
variation/Homo_sapiens/grch37/gatk-bundle/2.5/1000G_omni2.5.b37.vcf -resource:1000g,known=false,training=true,truth=false,prior=10.0 /lustre/scratch111/resources/variation/Homo_sapiens/grch37/gatk-bundle/
2.5/1000G_phase1.snps.high_confidence.b37.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /lustre/scratch111/resources/variation/Homo_sapiens/grch37/gatk-bundle/2.8/b37//dbsnp_138.b37.
vcf --target_titv 2.15 -an QD -an MQRankSum -an ReadPosRankSum -an FS -an InbreedingCoeff -an DP -an MQ -an SOR
INFO  17:29:10,811 HelpFormatter - Executing as mercury@hgs4b on Linux 3.8.0-44-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
INFO  17:29:10,811 HelpFormatter - Date/Time: 2015/05/17 17:29:10
INFO  17:29:10,812 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:29:10,812 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:29:11,493 GenomeAnalysisEngine - Strictness is SILENT
INFO  17:29:12,058 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  17:29:13,568 IntervalUtils - Processing 171115067 bp from intervals
WARN  17:29:13,570 IndexDictionaryUtils - Track input doesn't have a sequence dictionary built in, skipping dictionary validation
INFO  17:29:13,620 MicroScheduler - Running the GATK in parallel mode with 32 total threads, 1 CPU thread(s) for each of 32 data thread(s), of 32 processors available on this machine
INFO  17:29:13,758 GenomeAnalysisEngine - Preparing for traversal
INFO  17:29:13,765 GenomeAnalysisEngine - Done preparing for traversal
INFO  17:29:13,766 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  17:29:13,767 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  17:29:13,768 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
INFO  17:29:13,878 TrainingSet - Found hapmap track:    Known = false   Training = true         Truth = true    Prior = Q15.0
INFO  17:29:13,880 TrainingSet - Found omni track:      Known = false   Training = true         Truth = true    Prior = Q12.0
INFO  17:29:13,882 TrainingSet - Found 1000g track:     Known = false   Training = true         Truth = false   Prior = Q10.0
INFO  17:29:13,884 TrainingSet - Found dbsnp track:     Known = true    Training = false        Truth = false   Prior = Q2.0
INFO  17:30:00,834 ProgressMeter -      6:45767321    437766.0    47.0 s     107.0 s       26.7%     2.9 m       2.1 m
INFO  17:30:17,026 VariantDataManager - QD:      mean = 19.92    standard deviation = 5.86
INFO  17:30:17,226 VariantDataManager - MQRankSum:       mean = 0.06     standard deviation = 0.52
INFO  17:30:17,410 VariantDataManager - ReadPosRankSum:          mean = 0.25     standard deviation = 0.52
INFO  17:30:17,601 VariantDataManager - FS:      mean = 2.65     standard deviation = 3.93
INFO  17:30:17,790 VariantDataManager - InbreedingCoeff:         mean = -0.00    standard deviation = 0.19
INFO  17:30:17,985 VariantDataManager - DP:      mean = 1432.94  standard deviation = 185.41
INFO  17:30:18,179 VariantDataManager - MQ:      mean = 59.94    standard deviation = 0.72
INFO  17:30:18,374 VariantDataManager - SOR:     mean = 0.78     standard deviation = 0.40
INFO  17:30:19,569 VariantDataManager - Annotations are now ordered by their information content: [DP, MQ, QD, FS, ReadPosRankSum, MQRankSum, SOR, InbreedingCoeff]
INFO  17:30:19,642 VariantDataManager - Training with 611167 variants after standard deviation thresholding.
INFO  17:30:19,648 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO  17:30:30,839 ProgressMeter -     6:171052865   4569513.0    77.0 s      16.0 s      100.0%    77.0 s       0.0 s
INFO  17:31:00,843 ProgressMeter -     6:171052865   4569513.0   107.0 s      23.0 s      100.0%   107.0 s       0.0 s
INFO  17:31:30,847 ProgressMeter -     6:171052865   4569513.0     2.3 m      29.0 s      100.0%     2.3 m       0.0 s
INFO  17:31:35,265 VariantRecalibratorEngine - Finished iteration 0.
INFO  17:32:00,850 ProgressMeter -     6:171052865   4569513.0     2.8 m      36.0 s      100.0%     2.8 m       0.0 s
INFO  17:32:25,369 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 1.82124

...

INFO  17:45:45,833 VariantRecalibratorEngine - Finished iteration 95.   Current change in mixture coefficients = 0.00236
INFO  17:46:00,990 ProgressMeter -     6:171052865   4569513.0    16.8 m       3.7 m      100.0%    16.8 m       0.0 s
INFO  17:46:12,074 VariantRecalibratorEngine - Convergence after 98 iterations!
INFO  17:46:17,393 VariantRecalibratorEngine - Evaluating full set of 985716 variants...
INFO  17:46:17,455 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.
INFO  17:46:27,147 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to retrieve result
        at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
Caused by: java.lang.IllegalArgumentException: No data found.
        at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
        at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:408)
        at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:156)
        at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.notifyTraversalDone(HierarchicalMicroScheduler.java:226)
        at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:183)
        ... 5 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.4-0-g7e26428):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Unable to retrieve result
##### ERROR ------------------------------------------------------------------------------------------
`
Post edited by TechnicalVault on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Yep, that sounds threading related alright. Let us know what happens with fewer/no threads.

    As a general comment, running VQSR on a single chromosome is a bit risky. Even if it runs to completion, the model will be less powerful than if run on the full genome or exome.

  • TechnicalVaultTechnicalVault Cambridge, UKMember

    Replicated in single threaded mode. Now testing for the regression, trying GATK 3.3. I was hoping chromosome 6 would be big enough given that it's much larger than an exome, though now I come to think about it, I suspect the HLA may mess with things a bit.

    INFO  21:47:24,522 VariantRecalibratorEngine - Convergence after 98 iterations! 
    INFO  21:47:30,282 VariantRecalibratorEngine - Evaluating full set of 985716 variants... 
    INFO  21:47:30,350 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000. 
    INFO  21:47:34,263 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.IllegalArgumentException: No data found.
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:408)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:156)
            at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
            at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
            at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
            at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
            at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.4-0-g7e26428):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: No data found.
    
  • TechnicalVaultTechnicalVault Cambridge, UKMember

    Replicated bug in 3.3-0 but not 3.2-2 suggesting a regression between the two (both runs were multi-threaded with 32 cores).

Sign In or Register to comment.