Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

error_java.lang.IllegalArgumentException: No data found--when using VQSR

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:409)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:157)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: No data found.
ERROR ------------------------------------------------------------------------------------------

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @guoweihua
    Hi,

    What type of data are you working with? How many samples do you have, and are they whole genome or whole exome? Can you please post the exact command you ran?

    Thanks,
    Sheila

  • helenehelene LAMember

    I am encountering this same error. I am working with whole genome human data. I'm working on only one sample. The same command worked for 3 out of 5 of my samples, but 2 failed with this error.

    ##### ERROR --
    ##### ERROR stack trace 
    java.lang.IllegalArgumentException: No data found.
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:489)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:185)
    at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: No data found.
    ##### ERROR ------------------------------------------------------------------------------------------
    

    GATK version 3.6 and my command is:

    java -Xmx45g -jar $GATK -T VariantRecalibrator -R /ref/hg19/ucsc.hg19.fasta \
    -input /sample/vc_gatk/sample.raw.snps.indels.combined.vcf \
    -recalFile /sample/vc_gatk/sample.snp.recal \
    -tranchesFile /sample/vc_gatk/sample.snp.tranches \
    -rscriptFile /sample/vc_gatk/sample.snp.recalibrate_SNP_plots.R \
    -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /Genome/GATK/hg19/hapmap_3.3.hg19.sites.vcf \
    -resource:omni,known=false,training=true,truth=true,prior=12.0 /Genome/GATK/hg19/1000G_omni2.5.hg19.sites.vcf \
    -resource:1000G,known=false,training=true,truth=false,prior=10.0     /Genome/GATK/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf \
    -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /Genome/GATK/hg19/dbsnp_138.hg19.vcf \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -mode SNP 
    

    Thanks a lot for the help.

  • guoweihuaguoweihua JapanMember

    @Sheila thank you,I have my problem resolved.In the previous operation I cut out part of the whole genome sequence of one sample,this poor data maybe the major cause to this error.This time I combine all the part after HC, then the problem solved.

  • guoweihuaguoweihua JapanMember
    edited November 2016

    @helene you can try reduce the parameter value of --manGaussians or -minNumBad ,or remove some contigs(remove some variances of small contigs). may this be some help to you . Best wishes.

  • helenehelene LAMember

    @guoweihua Thanks for your help. I do think contigs can be an issue, but I am a little hesitate to change -maxGaussians. @Sheila Could you please offer some advice? Thanks so much.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited November 2016

    @helene
    Hi,

    So, you are running on one whole genome sample at a time? We have noticed VariantRecalibrator performs much better when you feed in more than one whole genome. Is there any reason you prefer to run on one whole genome at a time instead of running on all samples together?

    Thanks,
    Sheila

    P.S. Can you post the entire stack trace for the run that fails?

  • helenehelene LAMember

    @Sheila

    Thanks so much for the reply! Yes our study is not a cohort study and every sample may have their own variants of interest. I understand that GATK Best Practice (starting at the genotyping step) takes all samples together for better sensitivity, and I think that is a brilliant idea. However, after careful evaluation, we decided not to apply that to our samples for now.

    Please see below for the track:

    INFO 22:51:31,064 HelpFormatter - ----------------------------------------------------------------------------------
    INFO 22:51:31,066 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
    INFO 22:51:31,066 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO 22:51:31,066 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
    INFO 22:51:31,066 HelpFormatter - [Mon Oct 31 22:51:31 PDT 2016] Executing on Linux 3.10.0-327.22.2.el7.x86_64 amd64
    INFO 22:51:31,066 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 JdkDeflater
    INFO 22:51:31,069 HelpFormatter - Program Args: -T VariantRecalibrator -R /ref/hg19/ucsc.hg19.fasta -input /sample/vc_gatk/sample.raw.snps.indels.combined.vcf -recalFile /sample/vc_gatk/sample.snp.recal -tranchesFile /sample/vc_gatk/sample.snp.tranches -rscriptFile /sample/vc_gatk/sample.snp.recalibrate_SNP_plots.R -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /Genome/GATK/hg19/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /Genome/GATK/hg19/1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /Genome/GATK/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /Genome/GATK/hg19/dbsnp_138.hg19.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -mode SNP
    INFO 22:51:31,073 HelpFormatter - Executing as [email protected] on Linux 3.10.0-327.22.2.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14.
    INFO 22:51:31,073 HelpFormatter - Date/Time: 2016/10/31 22:51:31
    INFO 22:51:31,073 HelpFormatter - ----------------------------------------------------------------------------------
    INFO 22:51:31,073 HelpFormatter - ----------------------------------------------------------------------------------
    INFO 22:51:31,114 GenomeAnalysisEngine - Strictness is SILENT
    INFO 22:51:31,198 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 22:51:31,688 GenomeAnalysisEngine - Preparing for traversal
    INFO 22:51:31,693 GenomeAnalysisEngine - Done preparing for traversal
    INFO 22:51:31,694 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 22:51:31,694 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 22:51:31,694 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 22:51:31,782 TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
    INFO 22:51:31,782 TrainingSet - Found omni track: Known = false Training = true Truth = true Prior = Q12.0
    INFO 22:51:31,782 TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
    INFO 22:51:31,782 TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
    INFO 22:52:01,697 ProgressMeter - chr1:172169182 3449609.0 30.0 s 8.0 s 5.5% 9.1 m 8.6 m
    INFO 22:52:31,698 ProgressMeter - chr2:83282509 7301822.0 60.0 s 8.0 s 10.6% 9.4 m 8.4 m
    INFO 22:53:01,699 ProgressMeter - chr3:13531845 1.1221207E7 90.0 s 8.0 s 16.1% 9.3 m 7.8 m
    INFO 22:53:31,704 ProgressMeter - chr3:177353080 1.4954672E7 120.0 s 8.0 s 21.4% 9.4 m 7.4 m
    INFO 22:54:01,705 ProgressMeter - chr4:143097385 1.8846092E7 2.5 m 7.0 s 26.6% 9.4 m 6.9 m
    INFO 22:54:31,706 ProgressMeter - chr5:116999412 2.2649856E7 3.0 m 7.0 s 31.8% 9.4 m 6.4 m
    INFO 22:55:01,707 ProgressMeter - chr6:93581191 2.6462247E7 3.5 m 7.0 s 36.9% 9.5 m 6.0 m
    INFO 22:55:31,709 ProgressMeter - chr7:82274505 3.0312153E7 4.0 m 7.0 s 41.9% 9.5 m 5.5 m
    INFO 22:56:01,710 ProgressMeter - chr8:82416364 3.4190404E7 4.5 m 7.0 s 47.0% 9.6 m 5.1 m
    INFO 22:56:31,711 ProgressMeter - chr9:120745565 3.8011373E7 5.0 m 7.0 s 52.9% 9.4 m 4.4 m
    INFO 22:57:01,713 ProgressMeter - chr11:5768461 4.1928339E7 5.5 m 7.0 s 58.1% 9.5 m 4.0 m
    INFO 22:57:32,184 ProgressMeter - chr12:25681183 4.5630806E7 6.0 m 7.0 s 63.0% 9.5 m 3.5 m
    INFO 22:58:02,185 ProgressMeter - chr13:77264622 4.9523787E7 6.5 m 7.0 s 68.9% 9.4 m 2.9 m
    INFO 22:58:32,186 ProgressMeter - chr15:57224343 5.3406177E7 7.0 m 7.0 s 75.4% 9.3 m 2.3 m
    INFO 22:59:02,187 ProgressMeter - chr17:31686084 5.7416441E7 7.5 m 7.0 s 80.7% 9.3 m 107.0 s
    INFO 22:59:32,189 ProgressMeter - chr19:39816238 6.1458134E7 8.0 m 7.0 s 86.0% 9.3 m 77.0 s
    INFO 23:00:02,200 ProgressMeter - chrX:2722512 6.5535565E7 8.5 m 7.0 s 91.9% 9.2 m 44.0 s
    INFO 23:00:21,777 VariantDataManager - QD: mean = 17.93 standard deviation = 8.99
    INFO 23:00:22,216 VariantDataManager - MQ: mean = 59.90 standard deviation = 0.94
    INFO 23:00:22,652 VariantDataManager - MQRankSum: mean = -0.02 standard deviation = 0.28
    INFO 23:00:23,159 VariantDataManager - ReadPosRankSum: mean = 0.04 standard deviation = 1.03
    INFO 23:00:23,668 VariantDataManager - FS: mean = 1.39 standard deviation = 2.49
    INFO 23:00:24,091 VariantDataManager - SOR: mean = 0.92 standard deviation = 0.45
    INFO 23:00:24,513 VariantDataManager - DP: mean = 27.90 standard deviation = 9.33
    INFO 23:00:26,841 VariantDataManager - Annotations are now ordered by their information content: [MQ, DP, QD, MQRankSum, FS, SOR, ReadPosRankSum]
    INFO 23:00:27,038 VariantDataManager - Training with 3146426 variants after standard deviation thresholding.
    WARN 23:00:27,038 VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants.
    INFO 23:00:27,186 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO 23:00:32,201 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 9.0 m 7.0 s 100.0% 9.0 m 0.0 s
    INFO 23:01:02,202 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 9.5 m 8.0 s 100.0% 9.5 m 0.0 s
    INFO 23:01:32,203 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 10.0 m 8.0 s 100.0% 10.0 m 0.0 s
    INFO 23:02:02,208 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 10.5 m 9.0 s 100.0% 10.5 m 0.0 s
    INFO 23:02:32,209 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 11.0 m 9.0 s 100.0% 11.0 m 0.0 s
    INFO 23:03:02,211 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 11.5 m 10.0 s 100.0% 11.5 m 0.0 s
    INFO 23:03:32,212 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 12.0 m 10.0 s 100.0% 12.0 m 0.0 s
    INFO 23:04:02,213 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 12.5 m 10.0 s 100.0% 12.5 m 0.0 s
    INFO 23:04:08,766 VariantRecalibratorEngine - Finished iteration 0.
    INFO 23:04:32,214 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 13.0 m 11.0 s 100.0% 13.0 m 0.0 s
    INFO 23:05:02,215 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 13.5 m 11.0 s 100.0% 13.5 m 0.0 s
    INFO 23:05:38,144 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 14.1 m 12.0 s 100.0% 14.1 m 0.0 s
    INFO 23:05:42,338 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 1.98304
    INFO 23:06:08,146 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 14.6 m 12.0 s 100.0% 14.6 m 0.0 s
    INFO 23:06:38,147 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 15.1 m 13.0 s 100.0% 15.1 m 0.0 s
    INFO 23:07:08,148 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 15.6 m 13.0 s 100.0% 15.6 m 0.0 s
    INFO 23:07:19,067 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.17505
    INFO 23:07:38,151 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 16.1 m 14.0 s 100.0% 16.1 m 0.0 s
    INFO 23:08:08,153 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 16.6 m 14.0 s 100.0% 16.6 m 0.0 s
    INFO 23:08:38,154 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 17.1 m 15.0 s 100.0% 17.1 m 0.0 s
    INFO 23:08:57,933 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.02892
    INFO 23:09:08,157 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 17.6 m 15.0 s 100.0% 17.6 m 0.0 s
    INFO 23:09:38,159 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 18.1 m 15.0 s 100.0% 18.1 m 0.0 s
    INFO 23:10:08,160 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 18.6 m 16.0 s 100.0% 18.6 m 0.0 s
    INFO 23:10:38,161 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 19.1 m 16.0 s 100.0% 19.1 m 0.0 s
    INFO 23:10:38,172 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.01976
    INFO 23:11:08,162 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 19.6 m 17.0 s 100.0% 19.6 m 0.0 s
    INFO 23:11:38,164 ProgressMeter - chrUn_gl000249:16303 6.8298526E7 20.1 m 17.0 s 100.0% 20.1 m 0.0 s
    INFO 23:11:54,930 VariantRecalibratorEngine - Convergence after 24 iterations!
    INFO 23:12:07,078 VariantRecalibratorEngine - Evaluating full set of 4005806 variants...
    INFO 23:12:07,182 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.

    ERROR --
    ERROR stack trace

    java.lang.IllegalArgumentException: No data found.
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:489)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:185)
    at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: No data found.
    ERROR ------------------------------------------------------------------------------------------

    Issue · Github
    by Sheila

    Issue Number
    1416
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @helene, are you intentionally making calls off of the unplaced contigs (chrUn*)? I wonder if those might be responsible for this failure. We always call variants against a list of intervals (even for whole genomes) that exclude decoys and other non-canonical contigs, so running against absolutely all contigs is untested. It would be helpful to see if the run fails or not when you run VariantRecalibrator specifying only the canonical contigs to -L.

  • helenehelene LAMember

    @Geraldine_VdAuwera said:
    Hi @helene, are you intentionally making calls off of the unplaced contigs (chrUn*)? I wonder if those might be responsible for this failure. We always call variants against a list of intervals (even for whole genomes) that exclude decoys and other non-canonical contigs, so running against absolutely all contigs is untested. It would be helpful to see if the run fails or not when you run VariantRecalibrator specifying only the canonical contigs to -L.

    Hi @Geraldine,

    Thanks so much for the reply. I'm not intentionally calling off of the unplaced contigs at all. I'm doing it on the whole genome, and the same code works okay for many other samples that I have. I actually didn't quite understand the second half of your comment. What exactly is canonical contigs? Do you mean the non-unplaced contigs please? Thank you.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @helene
    Hi,

    The canonical contigs are indeed the properly placed chromosomes (1-22, X/Y and MT).

    -Sheila

  • helenehelene LAMember

    @Sheila said:
    @helene
    Hi,

    The canonical contigs are indeed the properly placed chromosomes (1-22, X/Y and MT).

    -Sheila

    Hi @Sheila ,

    Thanks so much for the clarification. So if I want to test all placed chromosomes at once, should I add "-L chr1" "-L chr2"... "-L chrMT" to my command? Or is there a simpler way to do it please? Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @helene You can do it that way or provide an intervals file listing your intervals of interest. See https://software.broadinstitute.org/gatk/documentation/article?id=1319 for acceptable formats.

  • thuthu Member

    Hi, @Geraldine_VdAuwera , I'm using VQSR to process single cell Whole Genome Sequencing data. However I encounter a problem "No data found". My problem seems different compared with the previous problem. Because my log doesn't have this part:

    INFO 22:52:01,697 ProgressMeter - chr1:172169182 3449609.0 30.0 s 8.0 s 5.5% 9.1 m 8.6 m
    INFO 22:52:31,698 ProgressMeter - chr2:83282509 7301822.0 60.0 s 8.0 s 10.6% 9.4 m 8.4 m
    INFO 22:53:01,699 ProgressMeter - chr3:13531845 1.1221207E7 90.0 s 8.0 s 16.1% 9.3 m 7.8 m
    INFO 22:53:31,704 ProgressMeter - chr3:177353080 1.4954672E7 120.0 s 8.0 s 21.4% 9.4 m 7.4 m
    INFO 22:54:01,705 ProgressMeter - chr4:143097385 1.8846092E7 2.5 m 7.0 s 26.6% 9.4 m 6.9 m
    ......

    My command line is :

    java -jar /home/mtx/software/gatk/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef/GenomeAnalysisTK.jar -R $ref -T VariantRecalibrator -input $output_Uni -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $hapmap -an QD -an FS -an MQRankSum -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile $output_Var_recal -tranchesFile $output_Var_tranches -rf BadCigar -an DP

    My log is :

    INFO 16:44:45,590 HelpFormatter - ------------------------------------------------------------------------------------
    INFO 16:44:45,592 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-1-0-gf15c1c3ef, Compiled 2018/02/19 05:43:50
    INFO 16:44:45,592 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO 16:44:45,593 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
    INFO 16:44:45,593 HelpFormatter - [Sun Mar 25 16:44:45 CST 2018] Executing on Linux 3.10.0-693.el7.x86_64 amd64
    INFO 16:44:45,593 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_131-b12
    INFO 16:44:45,596 HelpFormatter - Program Args: -R /data8t_0/mtx/singlecell/gtseq/human96cells/ref/ucsc.hg19.copyed.fasta -T VariantRecalibrator -input /data8t_0/mtx/singlecell/gtseq/human96cells/experiment/for_test/140603_DNA_A1_sc_1980000.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data8t_0/mtx/singlecell/gtseq/human96cells/ref/hapmap_3.3.hg19.sites.copyed.vcf -an QD -an FS -an MQRankSum -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile /data8t_0/mtx/singlecell/gtseq/human96cells/experiment/17_VariantRecalibrator/140603_DNA_A1_sc_1980000.recal -tranchesFile /data8t_0/mtx/singlecell/gtseq/human96cells/experiment/17_VariantRecalibrator/140603_DNA_A1_sc_1980000.tranches -rf BadCigar -an DP
    INFO 16:44:45,602 HelpFormatter - Executing as [email protected] on Linux 3.10.0-693.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_131-b12.
    INFO 16:44:45,602 HelpFormatter - Date/Time: 2018/03/25 16:44:45
    INFO 16:44:45,602 HelpFormatter - ------------------------------------------------------------------------------------
    INFO 16:44:45,603 HelpFormatter - ------------------------------------------------------------------------------------
    INFO 16:44:45,631 NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/mtx/software/gatk/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef/GenomeAnalysisTK.jar!/com/intel/gkl/native/libgkl_compression.so
    INFO 16:44:45,657 GenomeAnalysisEngine - Deflater: IntelDeflater
    INFO 16:44:45,657 GenomeAnalysisEngine - Inflater: IntelInflater
    INFO 16:44:45,658 GenomeAnalysisEngine - Strictness is SILENT
    INFO 16:44:45,863 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 16:44:46,058 GenomeAnalysisEngine - Preparing for traversal
    INFO 16:44:46,064 GenomeAnalysisEngine - Done preparing for traversal
    INFO 16:44:46,064 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 16:44:46,065 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 16:44:46,065 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 16:44:46,068 TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
    INFO 16:45:07,812 VariantDataManager - QD: mean = 20.31 standard deviation = 2.95
    INFO 16:45:07,820 VariantDataManager - FS: mean = 0.01 standard deviation = 0.24
    INFO 16:45:07,826 VariantDataManager - MQRankSum: mean = -0.02 standard deviation = 0.13
    INFO 16:45:07,833 VariantDataManager - ReadPosRankSum: mean = 0.07 standard deviation = 0.83
    INFO 16:45:07,839 VariantDataManager - DP: mean = 2.33 standard deviation = 0.59
    INFO 16:45:07,881 VariantDataManager - Annotation order is: [QD, DP, MQRankSum, FS, ReadPosRankSum]
    INFO 16:45:07,883 VariantDataManager - Training with 8448 variants after standard deviation thresholding.
    INFO 16:45:07,887 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO 16:45:08,301 VariantRecalibratorEngine - Finished iteration 0.
    INFO 16:45:08,532 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 8.86154
    INFO 16:45:08,698 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.17131
    INFO 16:45:08,872 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.06551
    INFO 16:45:09,050 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.04882
    INFO 16:45:09,228 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.05464
    INFO 16:45:09,408 VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.07291
    INFO 16:45:09,588 VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.10607
    INFO 16:45:09,770 VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.16608
    INFO 16:45:09,960 VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.06648
    INFO 16:45:10,146 VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.08308
    INFO 16:45:10,224 VariantRecalibratorEngine - Convergence after 52 iterations!
    INFO 16:45:10,270 VariantRecalibratorEngine - Evaluating full set of 20152 variants...
    WARN 16:45:10,270 VariantRecalibratorEngine - Evaluate datum returned a NaN.
    INFO 16:45:10,281 VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.

    ERROR --
    ERROR stack trace

    java.lang.IllegalArgumentException: No data found.
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:563)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:192)
    at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: No data found.
    ERROR ------------------------------------------------------------------------------------------

    Input is VCF of single cell whole genome sequencing data.

    Could you kindly tell me what's wrong with my experiment. Thank you!

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    There is nothing wrong with your experiment. You just don't have enough variants to create a recalibration model. Try reducing maxgaussians to 4 to by pass.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @thu
    Hi,

    Also have a look at this thread.

    -Sheila

  • LavanyaLavanya Member
    edited June 2018

    Hi Sheila,
    Has this issue has been solved. I am running VQSR on s single 15X whole genome sample (as project require us to analyze single sample due to cohort issues!)
    Gtak version: 4.0.2.1
    I tried reducing --max-gaussians 4 and --minimum-bad-variants 1000.
    Removed decoy contigs as suggested easrlier.

    But I still get to see the following error
    `
    commandline as below

    gatk VariantRecalibrator \
    -V ${sample_key}.sites_only_vcf_filename.vcf.gz \
    -R $ref \
    --trust-all-polymorphic \
    -resource hapmap,known=false,training=true,truth=true,prior=15.0:./$hapmap \
    -resource omni,known=false,training=true,truth=true,prior=12.0:./$omni \
    -resource 1000G,known=false,training=true,truth=false,prior=10.0:./$phase1_snps \
    -resource dbsnp,known=true,training=false,truth=false,prior=7.0:./$dbsnp \
    -an DP \
    -an QD \
    -an FS \
    -an SOR \
    -an MQ \
    -an MQRankSum \
    -an ReadPosRankSum \
    -mode SNP \
    --minimum-bad-variants 1000 \
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
    --max-gaussians 4 \
    --output-model ${sample_key}..snps.model.report \
    -O ${sample_key}.recalibrate_SNP.recal \
    --tranches-file ${sample_key}.recalibrate_SNP.tranches

    16:59:49.615 INFO ProgressMeter - 17:52112943 25.5 3877000 151788.7
    16:59:59.670 INFO ProgressMeter - 18:13534734 25.7 3946000 153483.1
    17:00:09.861 INFO ProgressMeter - 18:61160827 25.9 4011000 154987.3
    17:00:19.938 INFO ProgressMeter - 19:22993844 26.0 4085000 156829.0
    17:00:30.026 INFO ProgressMeter - 20:6962312 26.2 4154000 158455.2
    17:00:40.212 INFO ProgressMeter - 20:51798358 26.4 4216000 159785.5
    17:00:50.230 INFO ProgressMeter - 21:15815351 26.6 4250000 160061.2
    17:01:00.330 INFO ProgressMeter - 22:19940841 26.7 4316000 161522.8
    17:01:16.245 INFO ProgressMeter - 22:39721745 27.0 4348000 161121.0
    17:01:26.390 INFO ProgressMeter - X:35282643 27.2 4408000 162327.4
    17:01:36.579 INFO ProgressMeter - X:106588623 27.3 4469000 163551.0
    17:01:44.237 INFO ProgressMeter - Y:59029028 27.5 4519132 164616.7
    17:01:44.237 INFO ProgressMeter - Traversal complete. Processed 4519132 total variants in 27.5 minutes.
    17:01:44.538 INFO VariantDataManager - DP: mean = 15.62 standard deviation = 4.77
    17:01:45.018 INFO VariantDataManager - QD: mean = 20.39 standard deviation = 10.12
    17:01:45.507 INFO VariantDataManager - FS: mean = 1.34 standard deviation = 2.71
    17:01:45.962 INFO VariantDataManager - SOR: mean = 1.06 standard deviation = 0.59
    17:01:46.416 INFO VariantDataManager - MQ: mean = 59.91 standard deviation = 0.95
    17:01:46.901 INFO VariantDataManager - MQRankSum: mean = -0.01 standard deviation = 0.20
    17:01:47.483 INFO VariantDataManager - ReadPosRankSum: mean = 0.03 standard deviation = 0.95
    17:01:50.116 INFO VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, DP, MQRankSum, FS, SOR, ReadPosRankSum]
    17:01:50.307 INFO VariantDataManager - Training with 3108088 variants after standard deviation thresholding.
    17:01:50.308 WARN VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants.
    17:01:50.501 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
    17:06:33.399 INFO VariantRecalibratorEngine - Finished iteration 0.
    17:07:44.870 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.62375
    17:08:59.337 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.00000
    17:08:59.337 INFO VariantRecalibratorEngine - Convergence after 10 iterations!
    17:09:08.841 INFO VariantRecalibratorEngine - Evaluating full set of 3752883 variants...
    17:09:08.841 WARN VariantRecalibratorEngine - Evaluate datum returned a NaN.
    17:09:08.965 INFO VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
    17:09:09.012 INFO VariantRecalibrator - Shutting down engine
    [June 24, 2018 5:09:09 PM SGT] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 35.08 minutes.
    Runtime.totalMemory()=13227786240
    java.lang.IllegalArgumentException: No data found.
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:629)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:895)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:159)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:202)
    at org.broadinstitute.hellbender.Main.main(Main.java:288)

    `

    Post edited by Lavanya on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Lavanya
    Hi,

    No, I don't think the team has dedicated time to this. There are a couple of things you can try.

    1) To test, can you try merging another VCF with the one you are using? It can be a publicly available VCF just for testing purposes. It looks like your VCF does not have enough "bad variants" to train the model.

    2) You can try out the new CNN filtering workflow. Read more about it here and in the tool docs under variant filtering.

    -Sheila

  • @Geraldine_VdAuwera
    Hi,
    I meet a same error with info "error_java.lang.IllegalArgumentException: No data found"

    I have only one whole exom sequencing sample with "--max-gaussians 4 -mode INDEL" and return error.
    Then I adjust the parameter --max-gaussians to "--max-gaussians 3 ", and it work well.

    However, I want to know why it work well with "--max-gaussians 3 " and not work with "--max-gaussians 4"? How is the parameter --max-gaussians influence the result ?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @runninging

    For VQSR you need at least 1WGS or 30 exome samples. Please refer to https://gatkforums.broadinstitute.org/gatk/discussion/39/variant-quality-score-recalibration-vqsr/p1 for more information.
    The document states:

    This tool is expecting thousands of variant sites in order to achieve decent modeling with the Gaussian mixture model. Whole exome call sets work well, but anything smaller than that scale might run into difficulties.

    Please let me know if this helps.

    Regards
    Bhanu

Sign In or Register to comment.