Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

V8.2-1 VQSR error for indels (ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a))

aminziaaminzia aminziaMember

Hi there,

I am running VQSR (GenomeAnalysisTK-2.8-1-g932cd3a) on snps and indels of an exome dataset. The SNP case works fine but the indel case gives the following error which states it might be due to a big in the program. I'd appreciate any comment as how to resolve this issue. Thank you

Amin

PS:

INFO 19:32:31,908 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:32:31,911 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.8-1-g932cd3a, Compiled 2013/12/06 16:47:15
INFO 19:32:31,911 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 19:32:31,911 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 19:32:31,916 HelpFormatter - Program Args: -T VariantRecalibrator -R ucsc.hg19.fasta -input SAMPLE.indel.vcf -resource:mills,known=true,training=true,truth=true,prior=12.0 Mills_and_1000G_gold_standard.indels.hg19.vcf -an DP -an FS -mode INDEL -an ReadPosRankSum -an MQRankSum --maxGaussians 4 -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile SAMPLE.tmp.indel.vcf -tranchesFile SAMPLE.tranches.gatk.indel.recal.csv -rscriptFile SAMPLE.gatk.recal.indel.R
INFO 19:32:31,916 HelpFormatter - Date/Time: 2014/01/13 19:32:31
INFO 19:32:31,916 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:32:31,916 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:32:31,936 ArgumentTypeDescriptor - Dynamically determined type of SAMPLE.indel.vcf to be VCF
INFO 19:32:31,967 ArgumentTypeDescriptor - Dynamically determined type of Mills_and_1000G_gold_standard.indels.hg19.vcf to be VCF
INFO 19:32:32,816 GenomeAnalysisEngine - Strictness is SILENT
INFO 19:32:32,963 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 19:32:33,000 RMDTrackBuilder - Loading Tribble index from disk for file SAMPLE.indel.vcf
INFO 19:32:33,059 RMDTrackBuilder - Loading Tribble index from disk for file Mills_and_1000G_gold_standard.indels.hg19.vcf
INFO 19:32:33,244 GenomeAnalysisEngine - Preparing for traversal
INFO 19:32:33,266 GenomeAnalysisEngine - Done preparing for traversal
INFO 19:32:33,268 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 19:32:33,268 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
WARN 19:32:33,275 Utils - ********************************************************************************
WARN 19:32:33,276 Utils - * WARNING:
WARN 19:32:33,276 Utils - *
WARN 19:32:33,276 Utils - * Rscript not found in environment path.
WARN 19:32:33,276 Utils - * SAMPLE.gatk.recal.indel.R will be generated but PDF plots
WARN 19:32:33,277 Utils - * will not.
WARN 19:32:33,277 Utils - ********************************************************************************
INFO 19:32:33,281 TrainingSet - Found mills track: Known = true Training = true Truth = true Prior = Q12.0
INFO 19:32:53,178 VariantDataManager - DP: mean = 15.61 standard deviation = 17.91
INFO 19:32:53,185 VariantDataManager - FS: mean = 0.50 standard deviation = 1.74
INFO 19:32:53,189 VariantDataManager - ReadPosRankSum: mean = 0.01 standard deviation = 0.97
INFO 19:32:53,200 VariantDataManager - MQRankSum: mean = 0.04 standard deviation = 0.98
INFO 19:32:53,277 VariantDataManager - Annotations are now ordered by their information content: [DP, FS, MQRankSum, ReadPosRankSum]
INFO 19:32:53,279 VariantDataManager - Training with 10207 variants after standard deviation thresholding.
INFO 19:32:53,283 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 19:32:53,578 VariantRecalibratorEngine - Finished iteration 0.
INFO 19:32:53,739 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.20601
INFO 19:32:53,821 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.07243
INFO 19:32:53,903 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.11180
INFO 19:32:53,986 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.05371
INFO 19:32:54,068 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.00977
INFO 19:32:54,133 VariantRecalibratorEngine - Convergence after 29 iterations!
INFO 19:32:54,169 VariantRecalibratorEngine - Evaluating full set of 15801 variants...
INFO 19:32:54,726 VariantDataManager - Training with worst 184 scoring variants --> variants with LOD <= -5.0000.
INFO 19:32:54,727 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 19:32:54,728 VariantRecalibratorEngine - Finished iteration 0.
INFO 19:32:54,729 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.02527
INFO 19:32:54,730 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.02685
INFO 19:32:54,731 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.04499
INFO 19:32:54,732 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.10418
INFO 19:32:54,733 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.31466
INFO 19:32:54,734 VariantRecalibratorEngine - Convergence after 29 iterations!
INFO 19:32:54,734 VariantRecalibratorEngine - Evaluating full set of 15801 variants...
INFO 19:32:55,845 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: log10p: Values must be non-infinite and non-NAN
at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:237)
at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:225)
at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:250)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.GaussianMixtureModel.nanTolerantLog10SumLog10(GaussianMixtureModel.java:239)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.GaussianMixtureModel.evaluateDatumMarginalized(GaussianMixtureModel.java:286)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.GaussianMixtureModel.evaluateDatum(GaussianMixtureModel.java:244)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibratorEngine.evaluateDatum(VariantRecalibratorEngine.java:167)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibratorEngine.evaluateData(VariantRecalibratorEngine.java:100)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:360)
at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:139)
at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: log10p: Values must be non-infinite and non-NAN
ERROR ------------------------------------------------------------------------------------------
Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Sorry for the delay, I needed to consult with the devs. We're not sure what might be going on here. One possibility is that we don't recommend using DP for exomes, so that might be a factor. I would recommend you try running this again without DP. Let us know if you still see the same issue.

  • aminziaaminzia aminziaMember

    Thank you for your help.
    In my case, I was able to resolve the issue for my exome data after removing both DP and FS.

    Amin

  • JorgeAmigoJorgeAmigo Santiago de CompostelaMember

    almost this same issue happened to us. removing the DP annotation helped most of the samples failing at the VQSR step, but the very few that still failed needed the FS annotation to be removed too. we wouldn't mind removing the DP annotation from the defaults Best Practices notes if it's not indicated "when working with hybrid capture datasets since there is extreme variation in the depth to which targets are captured", but we would still be able to pay special attention to the strand bias. is there any particular reason why removing the FS annotation allows to get over the "code exception" message we're getting? here's the full log of the indel VQSR step:

    INFO  11:33:46,732 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  11:33:46,734 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.8-1-g932cd3a, Compiled 2013/12/06 16:47:15 
    INFO  11:33:46,734 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  11:33:46,734 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  11:33:46,740 HelpFormatter - Program Args: -dt NONE -T VariantRecalibrator --intervals design.bed -R /ngs/reference/hg19/human_hg19.fa -input sample.processed.vcf -recalFile sample.processed.indel.recal -tranchesFile sample.processed.indel.tranches -rscriptFile sample.processed.indel.plots.R -resource:mills,known=false,training=true,truth=true,prior=12.0 /ngs/reference/gatk_bundle/sorted/Mills_and_1000G_gold_standard.indels.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /ngs/reference/gatk_bundle/sorted/dbsnp_137.hg19.vcf -an FS -an MQRankSum -an ReadPosRankSum --maxGaussians 4 -mode INDEL 
    INFO  11:33:46,741 HelpFormatter - Date/Time: 2014/01/22 11:33:46 
    INFO  11:33:46,741 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  11:33:46,741 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  11:33:46,810 ArgumentTypeDescriptor - Dynamically determined type of sample.processed.vcf to be VCF 
    INFO  11:33:46,831 ArgumentTypeDescriptor - Dynamically determined type of /ngs/reference/gatk_bundle/sorted/Mills_and_1000G_gold_standard.indels.hg19.vcf to be VCF 
    INFO  11:33:46,838 ArgumentTypeDescriptor - Dynamically determined type of /ngs/reference/gatk_bundle/sorted/dbsnp_137.hg19.vcf to be VCF 
    INFO  11:33:47,893 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  11:33:48,141 GenomeAnalysisEngine - Downsampling Settings: No downsampling 
    INFO  11:33:48,275 RMDTrackBuilder - Loading Tribble index from disk for file sample.processed.vcf 
    INFO  11:33:48,354 RMDTrackBuilder - Loading Tribble index from disk for file /ngs/reference/gatk_bundle/sorted/Mills_and_1000G_gold_standard.indels.hg19.vcf 
    INFO  11:33:48,390 RMDTrackBuilder - Loading Tribble index from disk for file /ngs/reference/gatk_bundle/sorted/dbsnp_137.hg19.vcf 
    INFO  11:33:49,385 IntervalUtils - Processing 89476292 bp from intervals 
    INFO  11:33:49,510 GenomeAnalysisEngine - Preparing for traversal 
    INFO  11:33:49,598 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  11:33:49,599 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  11:33:49,599 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
    INFO  11:33:49,610 TrainingSet - Found mills track:     Known = false   Training = true     Truth = true    Prior = Q12.0 
    INFO  11:33:49,611 TrainingSet - Found dbsnp track:     Known = true    Training = false    Truth = false   Prior = Q2.0 
    INFO  11:34:19,620 ProgressMeter -  chr1:214836892        2.13e+05   30.0 s        2.4 m      8.7%         5.7 m     5.2 m 
    INFO  11:34:49,624 ProgressMeter -   chr3:43344668        4.42e+05   60.0 s        2.3 m     18.7%         5.3 m     4.3 m 
    INFO  11:35:19,627 ProgressMeter -   chr5:55081476        6.68e+05   90.0 s        2.2 m     28.4%         5.3 m     3.8 m 
    INFO  11:35:49,631 ProgressMeter -   chr7:55005733        9.15e+05  120.0 s        2.2 m     38.6%         5.2 m     3.2 m 
    INFO  11:36:19,635 ProgressMeter -  chr9:135668042        1.16e+06    2.5 m        2.2 m     48.8%         5.1 m     2.6 m 
    INFO  11:36:49,640 ProgressMeter -   chr12:2080204        1.42e+06    3.0 m        2.1 m     59.3%         5.1 m     2.1 m 
    INFO  11:37:19,646 ProgressMeter -  chr14:90097553        1.65e+06    3.5 m        2.1 m     69.1%         5.1 m    93.0 s 
    INFO  11:37:49,651 ProgressMeter -  chr17:12044353        1.90e+06    4.0 m        2.1 m     78.8%         5.1 m    64.0 s 
    INFO  11:38:19,658 ProgressMeter -  chr19:41942115        2.14e+06    4.5 m        2.1 m     88.5%         5.1 m    35.0 s 
    INFO  11:38:49,665 ProgressMeter -  chrX:105937177        2.38e+06    5.0 m        2.1 m     98.5%         5.1 m     4.0 s 
    INFO  11:38:52,821 VariantDataManager - FS:      mean = 1.13     standard deviation = 3.93 
    INFO  11:38:52,826 VariantDataManager - MQRankSum:   mean = -0.53    standard deviation = 1.71 
    INFO  11:38:52,833 VariantDataManager - ReadPosRankSum:      mean = 1.27     standard deviation = 1.27 
    INFO  11:38:52,892 VariantDataManager - Annotations are now ordered by their information content: [FS, MQRankSum, ReadPosRankSum] 
    INFO  11:38:52,894 VariantDataManager - Training with 2809 variants after standard deviation thresholding. 
    INFO  11:38:52,900 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
    INFO  11:38:53,196 VariantRecalibratorEngine - Finished iteration 0. 
    INFO  11:38:53,315 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.26457 
    INFO  11:38:53,384 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.10764 
    INFO  11:38:53,427 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.13271 
    INFO  11:38:53,470 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.62601 
    INFO  11:38:53,512 VariantRecalibratorEngine - Convergence after 24 iterations! 
    INFO  11:38:53,551 VariantRecalibratorEngine - Evaluating full set of 7692 variants... 
    INFO  11:38:53,555 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000. 
    INFO  11:38:55,400 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.NullPointerException
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:83)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:359)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:139)
        at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------------------------------------------------------------------------------------
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm not sure what's going on here; I'll ask @rpoplin (who wrote VQSR) to weigh in on this issue.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @JorgeAmigo, it looks like this is a different issue, possibly a bug. Would you be willing to share your data with us so we can debug the issue locally? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

  • JorgeAmigoJorgeAmigo Santiago de CompostelaMember

    sorry for the delay, but it wasn't easy for us to select the appropriate data files to reproduce the error. here are the submission details included in the readme.txt file inside the GATK_bug_VQSR_JAmigo.zip file we've just uploaded to your ftp site:

    we are unable to recalibrate certain vcf files. it has happened very rarely, and we were used to solved them by modifying the default parameters for the statistical model generation, because the type of error was always suggesting that the number of variants were too little. this one seems to be slightly different, and that is why we are reporting it.
    we usually use HaplotypeCaller results, but we've just tried UnifiedGenotyper results and detected the same issue, so we though that they could help tracing the issue. we first thought that it had to be with the number of gaussians used to build the model, so we tried 2 and 4 values and changed other similar values like minNumBad as we used to do with percentBad but without success. we realized that the most recent GATK changes should have selected by default all these values for us, so this is the main reason we are contacting you, because it is not working as expected.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks for the files, it is useful for us to have test data to debug why the program is not behaving as expected. I will let you know of any progress in this thread.

    That said I should caution you that it may be some time before we are able to process your bug report, because the lead VQSR developer is going to be traveling for several weeks starting very soon. So we may not have an answer for you before mid-March. I hope this is not too disruptive to your work, but if this is time-sensitive, I would recommend you work around the issue.

  • JorgeAmigoJorgeAmigo Santiago de CompostelaMember
    edited February 2014

    trying to work around the issue we tried forcing the gaussians behaviour described on the release notes by adding "--maxGaussians 2 --maxNegativeGaussians 2" on the command line. and it did work, so there must be a problem considering these parameters.

    with "it did work" I mean that the previous "code exception" error message wasn't generated. instead, it came out the already known "NaN LOD value assigned" error which we aren't able to get rid of, usually on indels, even if we add the suggested "--minNumBadVariants 5000". we are aware that working with single exomes may not generate enough variants to build the models, so we usually perform the hard filters in parallel and use them if the recalibration couldn't be performed due to the small number of variants, but it would be great to hear some more suggestions on this. here is the full indel recal log plus the error message we now get:

    Picked up _JAVA_OPTIONS: -Xmx16g -Djava.io.tmpdir=/scratch/9827464.1.g1-mem_big/sample
    INFO  12:10:52,734 ArgumentTypeDescriptor - Dynamically determined type of SureSelect_XT_Human_All_Exon_V4.sorted.bed to be BED 
    INFO  12:10:52,868 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  12:10:52,870 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.8-1-g932cd3a, Compiled 2013/12/06 16:47:15 
    INFO  12:10:52,870 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  12:10:52,870 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  12:10:52,875 HelpFormatter - Program Args: -dt NONE -T VariantRecalibrator --intervals SureSelect_XT_Human_All_Exon_V4.sorted.bed -R ~/ngs/reference/hg19/human_hg19.fa -input sample.vcf -recalFile sample.indel.recal -tranchesFile sample.indel.tranches -rscriptFile sample.indel.plots.R -resource:mills,known=false,training=true,truth=true,prior=12.0 ~/ngs/reference/gatk_bundle/sorted/Mills_and_1000G_gold_standard.indels.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ~/ngs/reference/gatk_bundle/sorted/dbsnp_137.hg19.vcf -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -an FS -an MQRankSum -an ReadPosRankSum --maxGaussians 2 --maxNegativeGaussians 2 --minNumBadVariants 5000 -mode INDEL 
    INFO  12:10:52,876 HelpFormatter - Date/Time: 2014/02/12 12:10:52 
    INFO  12:10:52,876 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  12:10:52,876 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  12:10:52,947 ArgumentTypeDescriptor - Dynamically determined type of sample.vcf to be VCF 
    INFO  12:10:52,962 ArgumentTypeDescriptor - Dynamically determined type of ~/ngs/reference/gatk_bundle/sorted/Mills_and_1000G_gold_standard.indels.hg19.vcf to be VCF 
    INFO  12:10:52,967 ArgumentTypeDescriptor - Dynamically determined type of ~/ngs/reference/gatk_bundle/sorted/dbsnp_137.hg19.vcf to be VCF 
    INFO  12:10:53,896 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  12:10:54,102 GenomeAnalysisEngine - Downsampling Settings: No downsampling 
    INFO  12:10:54,133 RMDTrackBuilder - Loading Tribble index from disk for file sample.vcf 
    INFO  12:10:54,204 RMDTrackBuilder - Loading Tribble index from disk for file ~/ngs/reference/gatk_bundle/sorted/Mills_and_1000G_gold_standard.indels.hg19.vcf 
    INFO  12:10:54,236 RMDTrackBuilder - Loading Tribble index from disk for file ~/ngs/reference/gatk_bundle/sorted/dbsnp_137.hg19.vcf 
    INFO  12:10:55,109 IntervalUtils - Processing 51189318 bp from intervals 
    INFO  12:10:55,215 GenomeAnalysisEngine - Preparing for traversal 
    INFO  12:10:55,286 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  12:10:55,287 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  12:10:55,287 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
    INFO  12:10:55,296 TrainingSet - Found mills track:     Known = false   Training = true     Truth = true    Prior = Q12.0 
    INFO  12:10:55,296 TrainingSet - Found dbsnp track:     Known = true    Training = false    Truth = false   Prior = Q2.0 
    INFO  12:11:25,298 ProgressMeter -  chr1:237433884        1.47e+05   30.0 s        3.4 m      9.7%         5.2 m     4.7 m 
    INFO  12:11:55,301 ProgressMeter -  chr3:113049044        3.10e+05   60.0 s        3.2 m     21.0%         4.8 m     3.8 m 
    INFO  12:12:25,304 ProgressMeter -    chr6:3723711        4.74e+05   90.0 s        3.2 m     32.3%         4.6 m     3.1 m 
    INFO  12:12:55,307 ProgressMeter -   chr8:74893388        6.51e+05  120.0 s        3.1 m     44.0%         4.5 m     2.5 m 
    INFO  12:13:25,312 ProgressMeter -  chr11:20529835        8.24e+05    2.5 m        3.0 m     55.2%         4.5 m     2.0 m 
    INFO  12:13:55,316 ProgressMeter - chr13:115004819        1.00e+06    3.0 m        3.0 m     66.9%         4.5 m    89.0 s 
    INFO  12:14:25,320 ProgressMeter -   chr17:6023636        1.17e+06    3.5 m        3.0 m     78.1%         4.5 m    59.0 s 
    INFO  12:14:55,325 ProgressMeter -  chr19:51871157        1.36e+06    4.0 m        2.9 m     89.8%         4.5 m    27.0 s 
    INFO  12:15:20,598 VariantDataManager - FS:      mean = 1.82     standard deviation = 4.74 
    INFO  12:15:20,599 VariantDataManager - MQRankSum:   mean = -1.47    standard deviation = 2.55 
    INFO  12:15:20,601 VariantDataManager - ReadPosRankSum:      mean = 1.67     standard deviation = 1.43 
    INFO  12:15:20,624 VariantDataManager - Annotations are now ordered by their information content: [ReadPosRankSum, MQRankSum, FS] 
    INFO  12:15:20,624 VariantDataManager - Training with 1214 variants after standard deviation thresholding. 
    WARN  12:15:20,624 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable. 
    INFO  12:15:20,629 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
    INFO  12:15:20,839 VariantRecalibratorEngine - Finished iteration 0. 
    INFO  12:15:20,901 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.02651 
    INFO  12:15:20,912 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.00076 
    INFO  12:15:20,912 VariantRecalibratorEngine - Convergence after 10 iterations! 
    INFO  12:15:20,919 VariantRecalibratorEngine - Evaluating full set of 1511 variants... 
    INFO  12:15:20,994 VariantDataManager - Training with worst 12 scoring variants --> variants with LOD <= -5.0000. 
    INFO  12:15:20,994 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
    INFO  12:15:20,996 VariantRecalibratorEngine - Finished iteration 0. 
    INFO  12:15:20,997 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.32610 
    INFO  12:15:20,998 VariantRecalibratorEngine - Convergence after 7 iterations! 
    INFO  12:15:22,589 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 2.8-1-g932cd3a): 
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --minNumBadVariants 5000, for example).
    ##### ERROR ------------------------------------------------------------------------------------------
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    At this time we don't have any solution for the "too few variants" problem. Future developments may resolve this; if so we will let you know.

  • HyunminHyunmin Seoul, KoreaMember

    I have an same issue.
    GATK 3.0, is it solved this problem?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Hyunmin,

    No, we do not have a solution for this yet.

  • aminziaaminzia aminziaMember

    To resolve this issue, I combine the VCFs for genotypes of all chromosomes before running the VQSR. Now we don't have failure for smaller chromosomes. But this raise the question as to whether training on the whole genome's data would change the general performance of VQSR. Could you guys please let us know what you think? Thank you. Amin

Sign In or Register to comment.