VariantRecalibrator INDEL mode error

yusmile0618yusmile0618 guangzhou ChinaMember

Hi, I'm using GATK3.2 to VariantRecalibrator INDEL

=========================== here is my command====================================

java -jar /usr/local/bin/GenomeAnalysisTK.jar -R /mnt/md1200/1/public_database/gatk_hg19/ucsc.hg19.fasta -T VariantRecalibrator -input 8_v1.gatk.recal.SNP.vcf -resource:1000G,known=true,training=true,truth=true,prior=12.0 /mnt/md1200/1/public_database/gatk_hg19/Mills_and_1000G_gold_standard.indels.hg19.vcf -an MQ -an FS -mode INDEL --maxGaussians 4 -recalFile 8_v1.gatk.SNP.INDEL.recal -tranchesFile 8_v1.gatk.SNP.INDEL.tranches -rscriptFile 8_v1.gatk.SNP.INDEL.plots.R

I run it with an error below

INFO 16:20:29,213 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 16:20:29,214 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 16:20:29,215 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 16:20:29,225 TrainingSet - Found 1000G track: Known = true Training = true Truth = true Prior = Q12.0
INFO 16:20:59,225 ProgressMeter - chr18:33989607 3080007.0 30.0 s 9.0 s 83.4% 35.0 s 5.0 s
INFO 16:21:03,085 VariantDataManager - MQ: mean = 38.89 standard deviation = 6.63
INFO 16:21:03,086 VariantDataManager - FS: mean = 0.00 standard deviation = 0.01
INFO 16:21:03,095 VariantDataManager - Annotations are now ordered by their information content: [MQ, FS]
INFO 16:21:03,095 VariantDataManager - Training with 123 variants after standard deviation thresholding.
WARN 16:21:03,096 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.
INFO 16:21:03,101 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 16:21:03,272 VariantRecalibratorEngine - Finished iteration 0.
INFO 16:21:03,304 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 2.03385
INFO 16:21:03,322 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.01026
INFO 16:21:03,337 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.00960
INFO 16:21:03,350 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.00914
INFO 16:21:03,363 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.00889
INFO 16:21:03,376 VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.00887
INFO 16:21:03,389 VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.00916
INFO 16:21:03,401 VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.00991
INFO 16:21:03,407 VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.01158
INFO 16:21:03,411 VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.01606
INFO 16:21:03,416 VariantRecalibratorEngine - Finished iteration 55. Current change in mixture coefficients = 0.04015
INFO 16:21:03,420 VariantRecalibratorEngine - Finished iteration 60. Current change in mixture coefficients = 0.08508
INFO 16:21:03,424 VariantRecalibratorEngine - Finished iteration 65. Current change in mixture coefficients = 0.05816
INFO 16:21:03,427 VariantRecalibratorEngine - Convergence after 69 iterations!
INFO 16:21:03,429 VariantRecalibratorEngine - Evaluating full set of 291 variants...
INFO 16:21:03,430 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.
INFO 16:21:08,746 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:83)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:392)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:138)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: No data found.

##### ERROR ------------------------------------------------------------------------------------------

i don't know what's wrong with my data / command.
any one meet the same error ? looking for your help . thanks!

Best Answer

Answers

  • yusmile0618yusmile0618 guangzhou ChinaMember

    @pdexheimer said:
    These lines in your output point toward the problem:

    INFO 16:21:03,095 VariantDataManager - Training with 123 variants after standard deviation thresholding. 
    WARN 16:21:03,096 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.
    

    Variant Recalibration needs many more variants - several thousand at least, preferably closer to 20,000

    thanks @pdexheimer. i rechecked the .vcf file, there is only 1343 variants, because i only use GATK to call one sample and we used targe sequencing of a small region which nearly 2M :(

    so, you mean the error is caused by my small variant data set? not my parameter or other command?

  • Right - with this data, there's no way to run VQSR, you'll need to do some sort of hard filtering

Sign In or Register to comment.