The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

"ERROR MESSAGE: No data found." from VariantRecalibrator

claybreshearsclaybreshears Hillsboro, ORPosts: 1Member

This crops up when running with "-mode INDEL". Not sure why there is no data. (See attached log file with stack trace.)

All input files are non-empty (except the .R file). A similar execution using "-mode SNP" completes with no problems. Since I'm simply looking to get the scripting and flags correct, I've used a public data set. Could it be that I'm unlucky and chose something that has no indels from the reference, which is causing the error? Could there be a more graceful method of termination?

log
log
NIST7035_TAAGGCGA_L001_R1_001.recalibrate.indel.log
8K

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,382Administrator, Dev admin

    It looks like you're using a pretty small dataset, so there might be no variants in your data that overlap with the model training resources. This happens often for indels if you're running on a small dataset. The solution is to use a bigger dataset -- unfortunately it's not possible to test VQSR on small datasets.

    We're looking at ways to improve how the program handles the issues stemming from having too few variants to work with, so hopefully future versions will be more graceful.

    Geraldine Van der Auwera, PhD

  • IrantzuIrantzu Posts: 14Member
    edited July 2014

    Hi @Geraldine,
    one little question. I'm running VariantRecalibrator, and it seems that is running OK but at the end I have the "##### ERROR MESSAGE: No data found." error. I think the command is OK, but the thing is, is possible to run variantrecalibrator with 4000 variants and only ONE sample? I'm asking this because I've read several comments about this issue and I'm not sure if it is possible to run the analysis only with one sample...

    Thanks in advance

    Post edited by Irantzu on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,382Administrator, Dev admin

    Hi @Irantzu,

    VQSR does not perform well (if at all) on a single sample. It can work with whole genome sequence, but if you're working with exome, there's just too few variants. Our recommendation for dealing with this is to get additional sample bams from the 1000Genomes project and add them to your callset (see this presentation for details.

    Geraldine Van der Auwera, PhD

  • quangquang Oxford UKPosts: 10Member

    Hi Geraldine,

    I put 39 million SNPs for VQSR but still got the message "##### ERROR MESSAGE: No data found.".

    Can it be the case where VQST cannot match the SNPs in the input to the training files because we do not include rsid information in the input file?

    Many thanks,
    Best regards,
    Quang.

  • SheilaSheila Broad InstitutePosts: 3,739Member, Broadie, Moderator, Dev admin

    @quang‌

    Hi Quang,

    Can you please post your command line and full log output.

    Thanks,
    Sheila

  • seruseru BergenPosts: 39Member ✭✭

    Hi and Happy New Year,

    I will post my logs as I am getting the same error. A brief background first. It can hopefully shed more light on this misterious issue. We continuously keep generating exome data in batches of 8 samples (1 NextSeq run), and genotype every new batch (using HC; GATK 3.3.0) with all exomes previously sequenced on the same platform/capture kit. So the set of joint-called samples grows gradually. In the beginning there was no problems with VQSR (we started from >30 exomes). When we crossed 80 exomes I got the 'No data found' exception first time (for SNPs). I removed MQ annotation from SNP VQSR (we use BWA MEM) and disabled multithreading, as suggested in multiple posts on this forum. This helped. Now, approaching 170 exomes, I got the same problem for INDELs. VQSR for prior run (8 exomes less) worked fine. Replacing the problematic run with 8 exomes sequenced on a different platform also didn't result in this exception. What could be the problem here? It can't be too little input data. When I removed the MQRankSum annotation (the least informative one) from INDEL VQSR step, it went fine. Is the data becoming too homogenous for the model as the samples accumulate?

    Any input is appreciated. Best regards,
    Pawel

    Here is my the error, with the args used:

       INFO  19:13:34,375 HelpFormatter - --------------------------------------------------------------------------------
        INFO  19:13:34,377 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22
        INFO  19:13:34,377 HelpFormatter - Copyright (c) 2010 The Broad Institute
        INFO  19:13:34,377 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
        INFO  19:13:34,380 HelpFormatter - Program Args: -T VariantRecalibrator -R /persistent/diagnostic/reference/g1k_v37/human_g1k_v37.fasta -input /scratch/diagnostics/160107_NS500635_0061_AHLYH5BGXX/160107_NS500635_0061_AHLYH5BGXX.multisample.vcf --maxGaussians 4 -resource:mills,known=false,training=true,truth=true,prior=12.0 /persistent/diagnostic/reference/g1k_v37/Mills_and_1000G_gold_standard.indels.b37.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /persistent/diagnostic/reference/g1k_v37/dbsnp_138.b37.vcf -an QD -an DP -an FS -an ReadPosRankSum -an MQRankSum -mode INDEL -recalFile /scratch/diagnostics/160107_NS500635_0061_AHLYH5BGXX/160107_NS500635_0061_AHLYH5BGXX.multisample.indel.model -tranchesFile /scratch/diagnostics/160107_NS500635_0061_AHLYH5BGXX/160107_NS500635_0061_AHLYH5BGXX.multisample.indel.model.tranches -rscriptFile /scratch/diagnostics/160107_NS500635_0061_AHLYH5BGXX/160107_NS500635_0061_AHLYH5BGXX.multisample.indel.model.plots.R
        INFO  19:13:34,385 HelpFormatter - Executing as ?@d369c9a76972 on Linux 3.16.0-56-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_91-b02.
        INFO  19:13:34,386 HelpFormatter - Date/Time: 2016/01/09 19:13:34
        INFO  19:13:34,386 HelpFormatter - --------------------------------------------------------------------------------
        INFO  19:13:34,386 HelpFormatter - --------------------------------------------------------------------------------
        INFO  19:13:36,014 GenomeAnalysisEngine - Strictness is SILENT
        INFO  19:13:36,269 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
        INFO  19:13:39,358 GenomeAnalysisEngine - Preparing for traversal
        INFO  19:13:39,377 GenomeAnalysisEngine - Done preparing for traversal
        INFO  19:13:39,378 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
        INFO  19:13:39,378 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
        INFO  19:13:39,379 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
        INFO  19:13:39,389 TrainingSet - Found mills track:         Known = false   Training = true         Truth = true    Prior = Q12.0
        INFO  19:13:39,390 TrainingSet - Found dbsnp track:         Known = true    Training = false        Truth = false   Prior = Q2.0
        INFO  19:14:09,423 ProgressMeter -      1:37206397    919122.0    30.0 s      32.0 s        1.2%    41.7 m      41.2 m
        INFO  19:14:39,451 ProgressMeter -      1:83817776   1954620.0    60.0 s      30.0 s        2.7%    37.0 m      36.0 m
        INFO  19:15:09,453 ProgressMeter -     1:153472711   3014298.0    90.0 s      29.0 s        4.9%    30.3 m      28.8 m
        INFO  19:15:39,455 ProgressMeter -     1:196374572   4005249.0   120.0 s      29.0 s        6.3%    31.6 m      29.6 m
        INFO  19:16:09,457 ProgressMeter -     1:248906839   5271675.0     2.5 m      28.0 s        8.0%    31.2 m      28.7 m
        INFO  19:16:39,459 ProgressMeter -      2:61044625   6836862.0     3.0 m      26.0 s       10.0%    30.0 m      27.0 m
        INFO  19:17:09,461 ProgressMeter -     2:137506314   8502246.0     3.5 m      24.0 s       12.5%    28.1 m      24.6 m
        INFO  19:17:39,462 ProgressMeter -     2:212393706   1.0140715E7     4.0 m      23.0 s       14.9%    26.9 m      22.9 m
        INFO  19:18:09,464 ProgressMeter -      3:36534129   1.1838863E7     4.5 m      22.0 s       17.1%    26.4 m      21.9 m
        INFO  19:18:39,466 ProgressMeter -     3:108996601   1.3467792E7     5.0 m      22.0 s       19.4%    25.8 m      20.8 m
        INFO  19:19:09,467 ProgressMeter -     3:178223504   1.5068286E7     5.5 m      21.0 s       21.6%    25.4 m      19.9 m
        INFO  19:19:39,469 ProgressMeter -      4:45498849   1.6747321E7     6.0 m      21.0 s       23.7%    25.3 m      19.3 m
        INFO  19:20:09,503 ProgressMeter -     4:120858180   1.8435624E7     6.5 m      21.0 s       26.2%    24.9 m      18.4 m
        INFO  19:20:39,504 ProgressMeter -     4:191042026   2.0137218E7     7.0 m      20.0 s       28.4%    24.6 m      17.6 m
        INFO  19:21:09,506 ProgressMeter -      5:74997042   2.1830903E7     7.5 m      20.0 s       30.8%    24.3 m      16.8 m
        INFO  19:21:39,507 ProgressMeter -     5:147999622   2.3497481E7     8.0 m      20.0 s       33.2%    24.1 m      16.1 m
        INFO  19:22:09,509 ProgressMeter -      6:33400096   2.5247443E7     8.5 m      20.0 s       35.3%    24.1 m      15.6 m
        INFO  19:22:39,510 ProgressMeter -     6:108996084   2.6958351E7     9.0 m      20.0 s       37.8%    23.8 m      14.8 m
        INFO  19:23:09,512 ProgressMeter -       7:8471762   2.8733513E7     9.5 m      19.0 s       40.0%    23.7 m      14.2 m
        INFO  19:23:39,513 ProgressMeter -      7:80996573   3.0443304E7    10.0 m      19.0 s       42.4%    23.6 m      13.6 m
        INFO  19:24:09,514 ProgressMeter -     7:155865502   3.2161014E7    10.5 m      19.0 s       44.8%    23.4 m      12.9 m
        INFO  19:24:39,516 ProgressMeter -      8:60995618   3.3900585E7    11.0 m      19.0 s       46.9%    23.5 m      12.5 m
        INFO  19:25:09,517 ProgressMeter -     8:137497616   3.5630237E7    11.5 m      19.0 s       49.3%    23.3 m      11.8 m
        INFO  19:25:39,519 ProgressMeter -      9:82999224   3.7336294E7    12.0 m      19.0 s       52.3%    22.9 m      10.9 m
        INFO  19:26:09,520 ProgressMeter -     10:11798385   3.9083434E7    12.5 m      19.0 s       54.6%    22.9 m      10.4 m
        INFO  19:26:39,522 ProgressMeter -     10:86998079   4.0813704E7    13.0 m      19.0 s       57.0%    22.8 m       9.8 m
        INFO  19:27:09,523 ProgressMeter -     11:22502054   4.2574642E7    13.5 m      19.0 s       59.3%    22.8 m       9.3 m
        INFO  19:27:39,524 ProgressMeter -     11:94501691   4.4270577E7    14.0 m      18.0 s       61.6%    22.7 m       8.7 m
        INFO  19:28:09,526 ProgressMeter -     12:30468189   4.6004444E7    14.5 m      18.0 s       63.9%    22.7 m       8.2 m
        INFO  19:28:39,527 ProgressMeter -    12:106996274   4.772222E7    15.0 m      18.0 s       66.3%    22.6 m       7.6 m
        INFO  19:29:09,528 ProgressMeter -     13:63576415   4.9475787E7    15.5 m      18.0 s       69.3%    22.4 m       6.9 m
        INFO  19:29:39,530 ProgressMeter -     14:36593796   5.1148678E7    16.0 m      18.0 s       72.1%    22.2 m       6.2 m
        INFO  19:30:09,531 ProgressMeter -    14:106998390   5.2817001E7    16.5 m      18.0 s       74.4%    22.2 m       5.7 m
        INFO  19:30:39,533 ProgressMeter -     15:94586228   5.4591134E7    17.0 m      18.0 s       77.4%    22.0 m       5.0 m
        INFO  19:31:09,534 ProgressMeter -     16:69813419   5.6323579E7    17.5 m      18.0 s       79.9%    21.9 m       4.4 m
        INFO  19:31:39,536 ProgressMeter -     17:47063825   5.8105082E7    18.0 m      18.0 s       82.1%    21.9 m       3.9 m
        INFO  19:32:09,537 ProgressMeter -     18:40000044   5.9843335E7    18.5 m      18.0 s       84.5%    21.9 m       3.4 m
    INFO  19:32:39,538 ProgressMeter -     19:30653797   6.1586296E7    19.0 m      18.0 s       86.7%    21.9 m       2.9 m
        INFO  19:33:09,540 ProgressMeter -     20:39996130   6.3311329E7    19.5 m      18.0 s       88.9%    21.9 m       2.4 m
        INFO  19:33:39,541 ProgressMeter -     22:22114820   6.5077829E7    20.0 m      18.0 s       91.9%    21.8 m     105.0 s
        INFO  19:34:09,543 ProgressMeter -      X:64999443   6.6840196E7    20.5 m      18.0 s       95.0%    21.6 m      65.0 s
        INFO  19:34:39,437 VariantDataManager - QD:          mean = 16.75    standard deviation = 7.85
        INFO  19:34:39,442 VariantDataManager - DP:          mean = 8812.68  standard deviation = 5706.10
        INFO  19:34:39,447 VariantDataManager - FS:          mean = 2.61     standard deviation = 7.96
        INFO  19:34:39,452 VariantDataManager - ReadPosRankSum:      mean = 0.12     standard deviation = 0.72
        INFO  19:34:39,458 VariantDataManager - MQRankSum:   mean = 0.24     standard deviation = 0.82
        INFO  19:34:39,544 ProgressMeter -        MT:16311   6.8555979E7    21.0 m      18.0 s       99.8%    21.0 m       2.0 s
        INFO  19:34:39,552 VariantDataManager - Annotations are now ordered by their information content: [DP, QD, FS, ReadPosRankSum, MQRankSum]
        INFO  19:34:39,553 VariantDataManager - Training with 2982 variants after standard deviation thresholding.
        INFO  19:34:39,557 GaussianMixtureModel - Initializing model with 100 k-means iterations...
        INFO  19:34:39,808 VariantRecalibratorEngine - Finished iteration 0.
        INFO  19:34:40,024 VariantRecalibratorEngine - Finished iteration 5.        Current change in mixture coefficients = 0.47109
        INFO  19:34:40,071 VariantRecalibratorEngine - Finished iteration 10.       Current change in mixture coefficients = 0.06015
        INFO  19:34:40,107 VariantRecalibratorEngine - Finished iteration 15.       Current change in mixture coefficients = 0.02086
        INFO  19:34:40,138 VariantRecalibratorEngine - Finished iteration 20.       Current change in mixture coefficients = 0.01137
        INFO  19:34:40,169 VariantRecalibratorEngine - Finished iteration 25.       Current change in mixture coefficients = 0.00532
        INFO  19:34:40,181 VariantRecalibratorEngine - Convergence after 27 iterations!
        INFO  19:34:40,212 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.
        INFO  19:35:09,546 ProgressMeter -        MT:16311   6.8555979E7    21.5 m      18.0 s       99.8%    21.5 m       2.0 s
        ##### ERROR ------------------------------------------------------------------------------------------
        ##### ERROR stack trace
        java.lang.IllegalArgumentException: No data found.
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:399)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:143)
            at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
            at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
            at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319)
            at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
            at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)
        ##### ERROR ------------------------------------------------------------------------------------------
        ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
        ##### ERROR
        ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
        ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
        ##### ERROR Visit our website and forum for extensive documentation and answers to
        ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
        ##### ERROR
        ##### ERROR MESSAGE: No data found.
        ##### ERROR ------------------------------------------------------------------------------------------
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,382Administrator, Dev admin

    Interesting -- could you please run VQSR again with GATK 3.5? There were some changes made to how GATK handles some annotations including MQ. I think those changes should help with this type of problem so I'd be curious to know if you see an improvement.

    Geraldine Van der Auwera, PhD

  • seruseru BergenPosts: 39Member ✭✭

    Thank you for the promt reply. We use GATK as part of a production system/pipeline, and for consistency and stability we would like to avoid changing software version as much as possible. Migrating to 3.5 is not as easy as replacing the jar, and will require some more testing. I could test if it runs just for the sake of checking, but unless we are really forced to do so, we will most likely not upgrade at the moment.

    Are there any other possibilities?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,382Administrator, Dev admin

    I completely understand that upgrading your production pipeline may not be immediately possible -- but it would be helpful to know if the changes address your problem or not. Depending on the answer our recommendations for dealing with the problem at hand may be different.

    Geraldine Van der Auwera, PhD

  • seruseru BergenPosts: 39Member ✭✭

    OK, I will give it a try and get back to you when I have more information. Cheers, Paweł

  • tandreantandrean EUPosts: 2Member
    edited February 29

    Hello Geraldine,

    I am having the same problem of Seru during my recalibration. Am using the dbSNP database to train the model but seems that I have not enough variants to test it. Have you and Seru found a solution to this problem? Or does it make sense to train the model and test it with few variants? Here the error output:

    INFO 18:19:12,630 HelpFormatter - --------------------------------------------------------------------------------
    INFO 18:19:12,633 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
    INFO 18:19:12,633 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 18:19:12,634 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 18:19:12,639 HelpFormatter - Program Args: -T VariantRecalibrator -R genome.ordered.fa -input SRR1792859.raw_variants.discovery.reordered.new_sorted_genome.reorderd.vcf -resource:hapmap,VCF,known=false,training=true,prior=6.0 SC_MOUSE_GENOMES.genotype.correct_no_header_yes.sort.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -recalFile SRR1792859.gatk.output.recal -tranchesFile SRRR1792859.gatk.output.tranches
    INFO 18:19:12,658 HelpFormatter - Executing as tandrean@n0002.mogon o-recalFile /project/jgu-cbdm/andradeLab/scratch/tandrean/Alignment_GATK/FASTQ/SRR1792859.gatk.output.recal -tranchesFile
    n Linux 2.6.32-573.12.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_75-mockbuild_2015_01_20_17_09-b00.
    INFO 18:19:12,659 HelpFormatter - Date/Time: 2016/02/27 18:19:12
    INFO 18:19:12,660 HelpFormatter - --------------------------------------------------------------------------------
    INFO 18:19:12,661 HelpFormatter - --------------------------------------------------------------------------------
    INFO 18:19:12,894 GenomeAnalysisEngine - Strictness is SILENT
    INFO 18:19:13,016 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 18:19:13,472 GenomeAnalysisEngine - Preparing for traversal
    INFO 18:19:13,525 GenomeAnalysisEngine - Done preparing for traversal
    INFO 18:19:13,527 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 18:19:13,528 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 18:19:13,529 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 18:19:13,533 TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q6.0
    INFO 18:19:43,534 ProgressMeter - chr1:122743674 2814713.0 30.0 s 10.0 s 4.6% 10.8 m 10.3 m
    INFO 18:20:13,540 ProgressMeter - chr2:75128096 6095769.0 60.0 s 9.0 s 10.3% 9.7 m 8.7 m
    INFO 18:20:43,544 ProgressMeter - chr3:57827798 9692443.0 90.0 s 9.0 s 16.5% 9.1 m 7.6 m
    INFO 18:21:13,547 ProgressMeter - chr4:58564126 1.3298667E7 120.0 s 9.0 s 22.5% 8.9 m 6.9 m
    INFO 18:21:43,551 ProgressMeter - chr5:64554860 1.688027E7 2.5 m 8.0 s 28.6% 8.7 m 6.2 m
    INFO 18:22:13,554 ProgressMeter - chr6:75094639 2.0493629E7 3.0 m 8.0 s 34.7% 8.6 m 5.6 m
    INFO 18:22:43,557 ProgressMeter - chr7:91057639 2.3996056E7 3.5 m 8.0 s 41.0% 8.5 m 5.0 m
    INFO 18:23:13,560 ProgressMeter - chr8:103385545 2.7585654E7 4.0 m 8.0 s 47.2% 8.5 m 4.5 m
    INFO 18:23:43,563 ProgressMeter - chr10:10096592 3.11579E7 4.5 m 8.0 s 53.3% 8.4 m 3.9 m
    INFO 18:24:13,565 ProgressMeter - chr11:36998251 3.4757291E7 5.0 m 8.0 s 59.2% 8.4 m 3.4 m
    INFO 18:24:43,568 ProgressMeter - chr12:79160743 3.8383947E7 5.5 m 8.0 s 65.4% 8.4 m 2.9 m
    INFO 18:25:13,570 ProgressMeter - chr13:119431216 4.1973972E7 6.0 m 8.0 s 71.5% 8.4 m 2.4 m
    INFO 18:25:43,573 ProgressMeter - chr15:38451910 4.5529606E7 6.5 m 8.0 s 77.7% 8.4 m 112.0 s
    INFO 18:26:13,576 ProgressMeter - chr16:90069253 4.9152735E7 7.0 m 8.0 s 83.5% 8.4 m 83.0 s
    INFO 18:26:43,579 ProgressMeter - chr18:53819538 5.2568337E7 7.5 m 8.0 s 89.4% 8.4 m 53.0 s
    INFO 18:27:13,581 ProgressMeter - chrX:100338657 5.6075792E7 8.0 m 8.0 s 96.9% 8.3 m 15.0 s
    INFO 18:27:22,306 VariantDataManager - QD: mean = 28.55 standard deviation = 5.38
    INFO 18:27:22,348 VariantDataManager - MQ: mean = 58.63 standard deviation = 5.40
    INFO 18:27:22,373 VariantDataManager - MQRankSum: mean = 0.09 standard deviation = 0.79
    INFO 18:27:22,454 VariantDataManager - ReadPosRankSum: mean = 0.12 standard deviation = 0.87
    INFO 18:27:22,516 VariantDataManager - FS: mean = 0.22 standard deviation = 1.18
    INFO 18:27:22,540 VariantDataManager - SOR: mean = 2.08 standard deviation = 0.78
    INFO 18:27:22,843 VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, SOR, FS, MQRankSum, ReadPosRankSum]
    INFO 18:27:22,873 VariantDataManager - Training with 13545 variants after standard deviation thresholding.
    INFO 18:27:22,892 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO 18:27:23,902 VariantRecalibratorEngine - Finished iteration 0.
    INFO 18:27:24,377 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.56957
    INFO 18:27:24,759 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.13187
    INFO 18:27:25,141 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.02591
    INFO 18:27:25,524 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.02285
    INFO 18:27:25,905 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.02331
    INFO 18:27:26,290 VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.02960
    INFO 18:27:26,678 VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.02657
    INFO 18:27:27,067 VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.00951
    INFO 18:27:27,459 VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.02208
    INFO 18:27:27,852 VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.02613
    INFO 18:27:28,241 VariantRecalibratorEngine - Finished iteration 55. Current change in mixture coefficients = 0.01265
    INFO 18:27:28,626 VariantRecalibratorEngine - Finished iteration 60. Current change in mixture coefficients = 0.01944
    INFO 18:27:29,015 VariantRecalibratorEngine - Finished iteration 65. Current change in mixture coefficients = 0.01271
    INFO 18:27:29,401 VariantRecalibratorEngine - Finished iteration 70. Current change in mixture coefficients = 0.00278
    INFO 18:27:29,792 VariantRecalibratorEngine - Finished iteration 75. Current change in mixture coefficients = 0.00214
    INFO 18:27:30,178 VariantRecalibratorEngine - Finished iteration 80. Current change in mixture coefficients = 0.00314.
    INFO 18:27:30,564 VariantRecalibratorEngine - Finished iteration 85. Current change in mixture coefficients = 0.00644
    INFO 18:27:30,642 VariantRecalibratorEngine - Convergence after 86 iterations!
    INFO 18:27:30,733 VariantRecalibratorEngine - Evaluating full set of 341838 variants...
    INFO 18:27:30,756 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.
    INFO 18:27:34,203 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException: No data found.
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:409)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:157)
    at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: No data found.
    ERROR ------------------------------------------------------------------------------------------

    Any comment would be very appreciated,
    Thanks.
    Tommaso

    Post edited by tandrean on

    Issue · Github
    by Sheila

    Issue Number
    685
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • SheilaSheila Broad InstitutePosts: 3,739Member, Broadie, Moderator, Dev admin

    @tandrean
    Hi Tommaso,

    I do not see the input dbsnp VCF in your command. Are you working with human data? Is you data from exomes or genomes?

    Thanks,
    Sheila

  • tandreantandrean EUPosts: 2Member

    Hello,

    am using chip-seq data and am using mouse data. The vcf file is present and is this one SC_MOUSE_GENOMES.genotype.correct_no_header_yes.sort.vcf and re sorted according to the dictionary of my reference.

    i downloaded it from dbSNP here:

    ftp://ftp.ncbi.nih.gov/snp/organisms/mouse_10090/VCF/genotype/

    any idea?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,382Administrator, Dev admin

    It looks like the program isn't assigning any variants to build the negative model. There are a few arguments you could try to tweak to override the normal behavior; off the top of my head I can't remember the exact name but it's something like minNumBad. I think that should help -- if not let me know and I'll have another look when I have more time.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.