VQSR - GATK runtime error

TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭
edited May 2015 in Ask the GATK team

I'm currently doing a comparison between 100 greek samples downsampled to 30x and 15x to explore the effects this has on our various tools. I'm currently only evaluating chromosome 6 as I need the initial comparison results soon and something went boom. Curiously enough it only affects the 15x version of the data and not the 30x. I suspect it might be something threading related? I'm going to retry with less and no threads. Confirmed same error with 31 threads, now testing in single threaded mode.

INFO  17:29:10,799 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
INFO  17:29:10,800 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  17:29:10,800 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  17:29:10,806 HelpFormatter - Program Args: -T VariantRecalibrator -nt 32 -R /lustre/scratch113/resources/ref/Homo_sapiens/1000Genomes_hs37d5/hs37d5.fa -input greek_bams/15x/15x_annot.vcf.gz --recal_
file greek_bams/15x_vqsr_snp_recal.vcf.gz --tranches_file greek_bams/15x_vqsr_snp_recal.tranches -mode SNP -rscriptFile greek_bams/15x.snp.plot -L 6 -l INFO -resource:hapmap,known=false,training=true,trut
h=true,prior=15.0 /lustre/scratch111/resources/variation/Homo_sapiens/grch37/gatk-bundle/2.5/hapmap_3.3.b37.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /lustre/scratch111/resources/
variation/Homo_sapiens/grch37/gatk-bundle/2.5/1000G_omni2.5.b37.vcf -resource:1000g,known=false,training=true,truth=false,prior=10.0 /lustre/scratch111/resources/variation/Homo_sapiens/grch37/gatk-bundle/
2.5/1000G_phase1.snps.high_confidence.b37.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /lustre/scratch111/resources/variation/Homo_sapiens/grch37/gatk-bundle/2.8/b37//dbsnp_138.b37.
vcf --target_titv 2.15 -an QD -an MQRankSum -an ReadPosRankSum -an FS -an InbreedingCoeff -an DP -an MQ -an SOR
INFO  17:29:10,811 HelpFormatter - Executing as [email protected] on Linux 3.8.0-44-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
INFO  17:29:10,811 HelpFormatter - Date/Time: 2015/05/17 17:29:10
INFO  17:29:10,812 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:29:10,812 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:29:11,493 GenomeAnalysisEngine - Strictness is SILENT
INFO  17:29:12,058 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  17:29:13,568 IntervalUtils - Processing 171115067 bp from intervals
WARN  17:29:13,570 IndexDictionaryUtils - Track input doesn't have a sequence dictionary built in, skipping dictionary validation
INFO  17:29:13,620 MicroScheduler - Running the GATK in parallel mode with 32 total threads, 1 CPU thread(s) for each of 32 data thread(s), of 32 processors available on this machine
INFO  17:29:13,758 GenomeAnalysisEngine - Preparing for traversal
INFO  17:29:13,765 GenomeAnalysisEngine - Done preparing for traversal
INFO  17:29:13,766 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  17:29:13,767 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  17:29:13,768 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
INFO  17:29:13,878 TrainingSet - Found hapmap track:    Known = false   Training = true         Truth = true    Prior = Q15.0
INFO  17:29:13,880 TrainingSet - Found omni track:      Known = false   Training = true         Truth = true    Prior = Q12.0
INFO  17:29:13,882 TrainingSet - Found 1000g track:     Known = false   Training = true         Truth = false   Prior = Q10.0
INFO  17:29:13,884 TrainingSet - Found dbsnp track:     Known = true    Training = false        Truth = false   Prior = Q2.0
INFO  17:30:00,834 ProgressMeter -      6:45767321    437766.0    47.0 s     107.0 s       26.7%     2.9 m       2.1 m
INFO  17:30:17,026 VariantDataManager - QD:      mean = 19.92    standard deviation = 5.86
INFO  17:30:17,226 VariantDataManager - MQRankSum:       mean = 0.06     standard deviation = 0.52
INFO  17:30:17,410 VariantDataManager - ReadPosRankSum:          mean = 0.25     standard deviation = 0.52
INFO  17:30:17,601 VariantDataManager - FS:      mean = 2.65     standard deviation = 3.93
INFO  17:30:17,790 VariantDataManager - InbreedingCoeff:         mean = -0.00    standard deviation = 0.19
INFO  17:30:17,985 VariantDataManager - DP:      mean = 1432.94  standard deviation = 185.41
INFO  17:30:18,179 VariantDataManager - MQ:      mean = 59.94    standard deviation = 0.72
INFO  17:30:18,374 VariantDataManager - SOR:     mean = 0.78     standard deviation = 0.40
INFO  17:30:19,569 VariantDataManager - Annotations are now ordered by their information content: [DP, MQ, QD, FS, ReadPosRankSum, MQRankSum, SOR, InbreedingCoeff]
INFO  17:30:19,642 VariantDataManager - Training with 611167 variants after standard deviation thresholding.
INFO  17:30:19,648 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO  17:30:30,839 ProgressMeter -     6:171052865   4569513.0    77.0 s      16.0 s      100.0%    77.0 s       0.0 s
INFO  17:31:00,843 ProgressMeter -     6:171052865   4569513.0   107.0 s      23.0 s      100.0%   107.0 s       0.0 s
INFO  17:31:30,847 ProgressMeter -     6:171052865   4569513.0     2.3 m      29.0 s      100.0%     2.3 m       0.0 s
INFO  17:31:35,265 VariantRecalibratorEngine - Finished iteration 0.
INFO  17:32:00,850 ProgressMeter -     6:171052865   4569513.0     2.8 m      36.0 s      100.0%     2.8 m       0.0 s
INFO  17:32:25,369 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 1.82124

...

INFO  17:45:45,833 VariantRecalibratorEngine - Finished iteration 95.   Current change in mixture coefficients = 0.00236
INFO  17:46:00,990 ProgressMeter -     6:171052865   4569513.0    16.8 m       3.7 m      100.0%    16.8 m       0.0 s
INFO  17:46:12,074 VariantRecalibratorEngine - Convergence after 98 iterations!
INFO  17:46:17,393 VariantRecalibratorEngine - Evaluating full set of 985716 variants...
INFO  17:46:17,455 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.
INFO  17:46:27,147 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to retrieve result
        at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
Caused by: java.lang.IllegalArgumentException: No data found.
        at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
        at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:408)
        at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:156)
        at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.notifyTraversalDone(HierarchicalMicroScheduler.java:226)
        at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:183)
        ... 5 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.4-0-g7e26428):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Unable to retrieve result
##### ERROR ------------------------------------------------------------------------------------------
`
Post edited by TechnicalVault on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yep, that sounds threading related alright. Let us know what happens with fewer/no threads.

    As a general comment, running VQSR on a single chromosome is a bit risky. Even if it runs to completion, the model will be less powerful than if run on the full genome or exome.

  • TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

    Replicated in single threaded mode. Now testing for the regression, trying GATK 3.3. I was hoping chromosome 6 would be big enough given that it's much larger than an exome, though now I come to think about it, I suspect the HLA may mess with things a bit.

    INFO  21:47:24,522 VariantRecalibratorEngine - Convergence after 98 iterations! 
    INFO  21:47:30,282 VariantRecalibratorEngine - Evaluating full set of 985716 variants... 
    INFO  21:47:30,350 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000. 
    INFO  21:47:34,263 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.IllegalArgumentException: No data found.
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:408)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:156)
            at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
            at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
            at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
            at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
            at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.4-0-g7e26428):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: No data found.
    
  • TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

    Replicated bug in 3.3-0 but not 3.2-2 suggesting a regression between the two (both runs were multi-threaded with 32 cores).

Sign In or Register to comment.