ERROR stack trace ; Unable to retrieve result ; A GATK RUNTIME ERROR has occurred

Hi,
Thanks very much for your answers for my previous questions. It seems that I encountered another difficulties when I run the QVSR steps because some ERROR information was spotted on the screen. These Error info is as follows:
INFO 18:10:01,046 GaussianMixtureModel - Initializing model with 30 k-means iterations...
INFO 18:10:01,165 VariantRecalibratorEngine - Finished iteration 0.
INFO 18:10:01,186 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.15059
INFO 18:10:01,196 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.06115
INFO 18:10:01,206 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.34881
INFO 18:10:01,208 VariantRecalibratorEngine - Convergence after 16 iterations!
INFO 18:10:01,211 VariantDataManager - Found 0 variants overlapping bad sites training tracks.
INFO 18:10:27,971 ProgressMeter - chr1:249230318 4.34e+06 90.0 s 20.0 s 100.0% 90.0 s 0.0 s
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Unable to retrieve result
at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
Caused by: java.lang.NullPointerException
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.selectWorstVariants(VariantDataManager.java:278)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:333)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:132)
at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.notifyTraversalDone(HierarchicalMicroScheduler.java:226)
at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:183)
... 5 more
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.7-2-g6bda569):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Unable to retrieve result
ERROR ------------------------------------------------------------------------------------------
I think the parameter I set are all right:
java -jar /ifs1/ST_POP/USER/lantianming/HUM/bin/GenomeAnalysisTK-2.7-2-g6bda569/GenomeAnalysisTK.jar
-R /ifs1/ST_POP/USER/lantianming/HUM/reference_human/chr1.fa
--maxGaussians 4
-numBad 4000
-T VariantRecalibrator
-mode SNP
-input /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/split_1_22_X_Y_M/chr1/chr1.recal_10.vcf
-resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /nas/RD_09C/resequencing/soft/pipeline/GATK/bundle/2.5/hg19/dbsnp_137.hg19.vcf
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 /nas/RD_09C/resequencing/soft/pipeline/GATK/bundle/2.5/hg19/hapmap_3.3.hg19.vcf
-resource:omni,known=false,training=true,truth=false,prior=12.0 /nas/RD_09C/resequencing/soft/pipeline/GATK/bundle/2.5/hg19/1000G_omni2.5.hg19.vcf
-an DP -an FS -an HaplotypeScore -an MQ0 -an MQ -an QD
-recalFile /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/split_1_22_X_Y_M/chr1/chr1.vcf.snp_11.recal
-tranchesFile /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/split_1_22_X_Y_M/chr1/chr1.vcf.snp_11.tranches
-rscriptFile /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/split_1_22_X_Y_M/chr1/chr1.vcf.snp_11.plot.R -nt 4
--TStranche 90.0 --TStranche 93.0 --TStranche 95.0 --TStranche 97.0
My input file is chr1 AND the sequencing depth is about 1× AND 4000 snp sites were call out by using UnifiedGenotyper.
So what I am not sure is that whether the number of snp sites were enough for doing VQSR?
Could you please give me some suggestions? thanks very much!!!
Best Answer
-
Geraldine_VdAuwera Cambridge, MA admin
Hi there,
The problem here is that you have a very small dataset. Specifically, setting
-numBad 4000
if you only have 4000 SNPs pretty much guarantees that it will fail, because it's like saying all the variants are bad. You could try reducing-numBad
, but I'm afraid that ultimately your dataset is just too small for VQSR. Either run on more variants (do you have an entire genome or just chr1? VQSR does not work well on isolated chromosomes) or use the hard-filtering recommendations instead of trying to run VQSR.We're going to try to improve the documentation on VQSR to make it clearer what the requirements are.
Answers
Hi there,
The problem here is that you have a very small dataset. Specifically, setting
-numBad 4000
if you only have 4000 SNPs pretty much guarantees that it will fail, because it's like saying all the variants are bad. You could try reducing-numBad
, but I'm afraid that ultimately your dataset is just too small for VQSR. Either run on more variants (do you have an entire genome or just chr1? VQSR does not work well on isolated chromosomes) or use the hard-filtering recommendations instead of trying to run VQSR.We're going to try to improve the documentation on VQSR to make it clearer what the requirements are.
I am performing a VQSR with GATK 3.4 on whole genome VCF file and get the same error.
INFO 11:02:35,585 VariantRecalibratorEngine - Evaluating full set of 4837109 variants...
INFO 11:02:35,740 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.
INFO 11:02:41,872 GATKRunReport - Uploaded run statistics report to AWS S3
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to retrieve result
at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
Caused by: java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:408)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:156)
at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.notifyTraversalDone(HierarchicalMicroScheduler.java:226)
at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:183)
... 5 more
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-0-g7e26428):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
Any clue why whole genome also have "zero" bad variants?
thanks,
Shuoguo
@55816815
Hi Shuoguo,
Can you post the exact command you ran? Are you running on indels or SNPs? It is possible to just not have enough overlap between your callset and the known variant datasets, even if you have a whole genome. This is especially true for indels.
-Sheila
@Sheila I have 3 samples mapped with bwa-mem, two succeeded and this is the only one failed (tried three times so not random failure).
quite strangely, the same 3 samples were done variant call the same exact way except using bwa-aln for mapping, and VQSR all success.
exact command i used:
thanks!
@55816815
Hi,
Okay. Can you try running without -nt 4. Users have reported random issues with multi-threading. Also, why are you running on each sample by itself? If you are trying to analyze the three samples together, it is best to perform joint variant calling and genotyping. https://www.broadinstitute.org/gatk/documentation/article?id=4150
-Sheila
@Sheila Thanks. Will remove -nt. Yes I can try joint call as well.
@Sheila Removing "-nt 4" resolved the issue! Great!