Error: Unable to retrieve result, with "VariantRecalibrator"

rcholicrcholic DenverMember
edited September 2014 in Ask the GATK team

My command lines are as following:

 java -Xmx8g -jar $CLASSPATH/GenomeAnalysisTK.jar \
 -T VariantRecalibrator \
 -R $GenomeReference \
 -input $InputVCF \
 -nt 6 \
 -resource:mills,known=true,training=true,truth=true,prior=12.0 $resource1 \
 -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 $resource2 \
 -an DP -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ \
 --maxGaussians 8 \
 -mode INDEL \
 -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
 -log $IndelsOutput/Indels.log \
 -recalFile $IndelsOutput/exome.indels.vcf.recal \
 -tranchesFile $IndelsOutput/exome.indels.tranches \
 -rscriptFile $IndelsOutput/exome.indels.recal.plots.R

But I got the following error when running this:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to retrieve result
    at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)
Caused by: java.lang.IllegalArgumentException: No data found.
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:83)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:392)
    at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:138)
    at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.notifyTraversalDone(HierarchicalMicroScheduler.java:226)
    at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:183)
    ... 5 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):

So I am suspecting this is a bug with GATK 3.2.2 ?

Answers

  • pdexheimerpdexheimer Member ✭✭✭✭

    This was caused by an empty set of "bad" variants - i.e., the negative training model. There should be hints of this in the output log.

    I'm not certain the solution. It could be that you don't have enough indels in your data to recalibrate, or it could be that you could fix this by relaxing some of the "bad variant" parameters

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    Have you tried disabling multithreading? -nt 1 instead of -nt 6. Also the recommended best practice for INDELs is --maxGaussians 4.

    http://gatkforums.broadinstitute.org/discussion/1259/what-vqsr-training-sets-arguments-should-i-use-for-my-specific-project
    
  • mmokrejsmmokrejs Czech RepublicMember
    edited May 2017

    I hit this as well. Why isn't the crash fixed yet? I was asked by @Sheila for in -resource:omni,known=false,training=true,truth=true in http://gatkforums.broadinstitute.org/gatk/discussion/comment/38380/#Comment_38380 . But at least once I succeeded with the command, seems is not very reproducible.

    Probably related to the crash is:
    MQ: mean = Infinity standard deviation = NaN in the output below.

    Here is the full log:

    java -Djavaio.tmpdir=. -Xmx58g -jarGenomeAnalysisTK-3.7/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 16 -R ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input mysample.raw.vcf -
    resource:hapmap,known=false,training=true,truth=true,prior=15.0 ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=true,prior=12.0 ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G
    ,known=false,training=true,truth=false,prior=10.0 ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -
    an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -MQCap 70 -an InbreedingCoeff --target_titv 3.2 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile mysample.recalibrate_SNP.recal -tranchesFile mysample.recalibrate_SNP.tranches -rscriptFile mysample.recalibrate_SNP_plots.R
    INFO  03:10:29,440 HelpFormatter - --------------------------------------------------------------------------------------------
    INFO  03:10:29,449 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
    INFO  03:10:29,449 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
    INFO  03:10:29,449 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
    INFO  03:10:29,450 HelpFormatter - [Tue May 02 03:10:29 CEST 2017] Executing on Linux 2.6.32-642.15.1.el6.Bull.110.x86_64 amd64
    INFO  03:10:29,450 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13 
    INFO  03:10:29,453 HelpFormatter - Program Args: -T VariantRecalibrator -nt 16 -R ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input mysample.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.
    0 ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=true,prior=12.0 ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/wor
    k/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -
    MQCap 70 -an InbreedingCoeff --target_titv 3.2 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile mysample.recalibrate_SNP.recal -tranchesFile mysample.recalibrate_SNP.tranches -rscriptFile mysample.recalibrate_SNP_plots.R
    INFO  03:10:29,465 HelpFormatter - Executing as [email protected] on Linux 2.6.32-642.15.1.el6.Bull.110.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13.
    INFO  03:10:29,465 HelpFormatter - Date/Time: 2017/05/02 03:10:29
    INFO  03:10:29,465 HelpFormatter - --------------------------------------------------------------------------------------------
    INFO  03:10:29,465 HelpFormatter - --------------------------------------------------------------------------------------------
    INFO  03:10:30,984 GenomeAnalysisEngine - Strictness is SILENT
    INFO  03:10:32,465 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    WARN  03:10:33,896 IndexDictionaryUtils - Track hapmap doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN  03:10:33,896 IndexDictionaryUtils - Track omni doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN  03:10:33,896 IndexDictionaryUtils - Track 1000G doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN  03:10:33,897 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation
    INFO  03:10:33,905 MicroScheduler - Running the GATK in parallel mode with 16 total threads, 1 CPU thread(s) for each of 16 data thread(s), of 16 processors available on this machine
    INFO  03:10:35,442 GenomeAnalysisEngine - Preparing for traversal
    INFO  03:10:35,449 GenomeAnalysisEngine - Done preparing for traversal
    INFO  03:10:35,449 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO  03:10:35,449 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
    INFO  03:10:35,450 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
    INFO  03:10:36,979 TrainingSet - Found hapmap track:    Known = false   Training = true         Truth = true    Prior = Q15.0
    INFO  03:10:36,980 TrainingSet - Found omni track:      Known = false   Training = true         Truth = true    Prior = Q12.0
    INFO  03:10:36,980 TrainingSet - Found 1000G track:     Known = false   Training = true         Truth = false   Prior = Q10.0
    INFO  03:10:36,980 TrainingSet - Found dbsnp track:     Known = true    Training = false        Truth = false   Prior = Q2.0
    INFO  03:11:05,452 ProgressMeter -  chr1:103998498   6009788.0    30.0 s       4.0 s        3.2%    15.5 m      15.0 m
    INFO  03:11:35,454 ProgressMeter -   chr2:14999034   1.3861913E7    60.0 s       4.0 s        8.2%    12.2 m      11.2 m
    INFO  03:12:05,455 ProgressMeter -  chr2:194026932   2.3585847E7    90.0 s       3.0 s       13.8%    10.9 m       9.4 m
    INFO  03:12:35,456 ProgressMeter -  chr3:138411299   3.4171297E7   120.0 s       3.0 s       19.6%    10.2 m       8.2 m
    INFO  03:13:05,459 ProgressMeter -  chr4:107898781   4.4669434E7     2.5 m       3.0 s       24.8%    10.1 m       7.6 m
    INFO  03:13:35,461 ProgressMeter -   chr5:83999960   5.3947029E7     3.0 m       3.0 s       30.0%    10.0 m       7.0 m
    INFO  03:14:05,461 ProgressMeter -   chr6:87564646   6.431768E7     3.5 m       3.0 s       35.7%     9.8 m       6.3 m
    INFO  03:14:35,463 ProgressMeter -   chr7:96163115   7.437817E7     4.0 m       3.0 s       41.3%     9.7 m       5.7 m
    INFO  03:15:05,464 ProgressMeter -   chr8:84356816   8.4092746E7     4.5 m       3.0 s       45.9%     9.8 m       5.3 m
    INFO  03:15:35,468 ProgressMeter -  chr9:118999309   9.338393E7     5.0 m       3.0 s       51.5%     9.7 m       4.7 m
    INFO  03:16:05,469 ProgressMeter -  chr11:31027887   1.03618838E8     5.5 m       3.0 s       57.2%     9.6 m       4.1 m
    INFO  03:16:35,471 ProgressMeter -  chr12:61418776   1.13194864E8     6.0 m       3.0 s       62.3%     9.6 m       3.6 m
    INFO  03:17:05,473 ProgressMeter - chr13:110157935   1.22662577E8     6.5 m       3.0 s       68.0%     9.6 m       3.1 m
    INFO  03:17:35,474 ProgressMeter -  chr15:75999853   1.32309511E8     7.0 m       3.0 s       73.8%     9.5 m       2.5 m
    INFO  03:18:05,475 ProgressMeter -  chr17:53999791   1.42456476E8     7.5 m       3.0 s       79.1%     9.5 m     118.0 s
    INFO  03:18:35,476 ProgressMeter -  chr20:13997981   1.53600937E8     8.0 m       3.0 s       84.8%     9.4 m      86.0 s
    INFO  03:19:05,480 ProgressMeter -   chrX:42769484   1.62494595E8     8.5 m       3.0 s       90.7%     9.4 m      52.0 s
    INFO  03:19:35,481 ProgressMeter - chr14_GL000009v2_random:692   1.6749051E8     9.0 m       3.0 s       96.0%     9.4 m      22.0 s
    INFO  03:20:05,482 ProgressMeter - chr7_KI270803v1_alt:773517   1.67490584E8     9.5 m       3.0 s       97.0%     9.8 m      17.0 s
    INFO  03:20:35,483 ProgressMeter - chr22_KI270928v1_alt:60414   1.67493686E8    10.0 m       3.0 s       98.9%    10.1 m       6.0 s
    INFO  03:21:05,484 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    10.5 m       3.0 s       99.7%    10.5 m       1.0 s
    INFO  03:21:35,485 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    11.0 m       3.0 s       99.7%    11.0 m       1.0 s
    INFO  03:22:05,486 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    11.5 m       4.0 s       99.7%    11.5 m       1.0 s
    INFO  03:22:35,488 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    12.0 m       4.0 s       99.7%    12.0 m       2.0 s
    INFO  03:23:05,489 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    12.5 m       4.0 s       99.7%    12.5 m       2.0 s
    INFO  03:23:35,493 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    13.0 m       4.0 s       99.7%    13.0 m       2.0 s
    INFO  03:24:05,494 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    13.5 m       4.0 s       99.7%    13.5 m       2.0 s
    INFO  03:24:35,495 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    14.0 m       5.0 s       99.7%    14.0 m       2.0 s
    INFO  03:25:05,496 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    14.5 m       5.0 s       99.7%    14.5 m       2.0 s
    INFO  03:25:35,497 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    15.0 m       5.0 s       99.7%    15.0 m       2.0 s
    INFO  03:26:05,498 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    15.5 m       5.0 s       99.7%    15.5 m       2.0 s
    INFO  03:26:35,499 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    16.0 m       5.0 s       99.7%    16.0 m       2.0 s
    INFO  03:27:05,500 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493687E8    16.5 m       5.0 s       99.7%    16.5 m       2.0 s
    INFO  03:27:23,041 VariantDataManager - QD:      mean = 16.98    standard deviation = 7.37
    INFO  03:27:23,109 VariantDataManager - FS:      mean = 1.62     standard deviation = 3.57
    INFO  03:27:23,156 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18
    INFO  03:27:23,216 VariantDataManager - MQ:      mean = Infinity         standard deviation = NaN
    INFO  03:27:23,267 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.48
    INFO  03:27:23,319 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  03:27:23,368 VariantDataManager - InbreedingCoeff:         mean = 0.01     standard deviation = 0.20
    INFO  03:27:23,715 VariantDataManager - Annotations are now ordered by their information content: [QD, SOR, FS, InbreedingCoeff, ReadPosRankSum, MQRankSum, MQ]
    INFO  03:27:23,742 VariantDataManager - Training with 174429 variants after standard deviation thresholding.
    INFO  03:27:23,746 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    ##### ERROR --
    ##### ERROR stack trace
    org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to retrieve result
            at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
            at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
            at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
            at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    Caused by: java.lang.NullPointerException
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.GaussianMixtureModel.initializeMeansUsingKMeans(GaussianMixtureModel.java:163)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.GaussianMixtureModel.initializeRandomModel(GaussianMixtureModel.java:125)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.variationalBayesExpectationMaximization(VariantRecalibratorEngine.java:147)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:92)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:484)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:185)
            at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.notifyTraversalDone(HierarchicalMicroScheduler.java:226)
            at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:183)
            ... 5 more
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Unable to retrieve result
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    This is a type of instability related to the shape of the data. The team is working on other approaches that we think will be more stable.
  • mmokrejsmmokrejs Czech RepublicMember
    edited May 2017

    Interestingly, if I change for my exome data the previous per-sample step to use -AS I just avoid the crash because MQ: mean = 1.67 standard deviation = 0.34 :

    I changed

    "$java_path" $java_opts -Xmx${ask_for_memory}g -jar "$gatk_binpath"GenomeAnalysisTK.jar -T HaplotypeCaller --genotyping_mode DISCOVERY --useNewAFCalculator --emitRefConfidence GVCF --pcr_indel_model "$pcr_indel_model" --sample_name "$sample" -R "$reference_flatfile" $ranges_arg -I "$sample_calmd_bam" --dbsnp "$dbsnp_file_Broad_style" -o "$sample"."$aligner"gatk.HaplotypeCaller.g.vcf
    

    to

    "$java_path" $java_opts -Xmx${ask_for_memory}g -jar "$gatk_binpath"GenomeAnalysisTK.jar -T HaplotypeCaller --genotyping_mode DISCOVERY --useNewAFCalculator --emitRefConfidence GVCF --pcr_indel_model "$pcr_indel_model" --sample_name "$sample" -R "$reference_flatfile" $ranges_arg -I "$sample_calmd_bam" --dbsnp "$dbsnp_file_Broad_style" -G Standard -G AS_Standard -o "$sample"."$aligner"gatk.HaplotypeCaller.AS.g.vcf
    

    and subsequently followed NOT with:

    $java_path $java_opts -Xmx${ask_for_memory}g -jar ${gatk_binpath}GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R "$reference_flatfile" --input mysample.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 "$hapmap_file" -resource:omni,known=false,training=true,truth=true,prior=12.0 "$omni_file" -resource:1000G,known=false,training=true,truth=false,prior=10.0 "$KG_file" -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 "$dbsnp_file_Broad_style" -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -MQCap 70 -an InbreedingCoeff --target_titv 3.2 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile "$prefix".recalibrate_SNP.recal -tranchesFile "$prefix".recalibrate_SNP.tranches -rscriptFile "$prefix".recalibrate_SNP_plots.R
    

    but with

    $java_path $java_opts -Xmx${ask_for_memory}g -jar ${gatk_binpath}GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -AS -R "$reference_flatfile" --input mysample.AS.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 "$hapmap_file" -resource:omni,known=false,training=true,truth=true,prior=12.0 "$omni_file" -resource:1000G,known=false,training=true,truth=false,prior=10.0 "$KG_file" -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 "$dbsnp_file_Broad_style" -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -MQCap 70 -an InbreedingCoeff --target_titv 3.2 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile "$prefix".AS.recalibrate_SNP.recal -tranchesFile "$prefix".AS.recalibrate_SNP.tranches -rscriptFile "$prefix".AS.recalibrate_SNP_plots.R
    

    And hey, I do get some VQSR output alsthough still bad IMHO.

    INFO  18:05:26,460 VariantDataManager - QD:      mean = 17.20    standard deviation = 7.21
    INFO  18:05:26,529 VariantDataManager - FS:      mean = 1.63     standard deviation = 3.58
    INFO  18:05:26,580 VariantDataManager - SOR:     mean = 1.15     standard deviation = 1.18
    INFO  18:05:26,624 VariantDataManager - MQ:      mean = 1.67     standard deviation = 0.34
    INFO  18:05:26,683 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.48
    INFO  18:05:26,729 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  18:05:26,776 VariantDataManager - InbreedingCoeff:         mean = 0.01     standard deviation = 0.17
    INFO  18:05:27,107 VariantDataManager - Annotations are now ordered by their information content: [QD, MQ, SOR, FS, InbreedingCoeff, ReadPosRankSum, MQRankSum]
    INFO  18:05:27,137 VariantDataManager - Training with 171536 variants after standard deviation thresholding.
    INFO  18:05:27,141 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    
    
    INFO  18:07:26,264 VariantRecalibratorEngine - Convergence after 102 iterations!
    INFO  18:07:26,920 VariantRecalibratorEngine - Evaluating full set of 520227 variants...
    INFO  18:07:33,136 VariantDataManager - Training with worst 25380 scoring variants --> variants with LOD <= -5.0000.
    INFO  18:07:33,136 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  18:07:33,419 VariantRecalibratorEngine - Finished iteration 0.
    INFO  18:07:33,555 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.06133
    INFO  18:07:33,681 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.04068
    INFO  18:07:33,805 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.02330
    INFO  18:07:33,930 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.01724
    INFO  18:07:34,052 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.01210
    INFO  18:07:34,176 VariantRecalibratorEngine - Finished iteration 30.   Current change in mixture coefficients = 0.00883
    INFO  18:07:34,299 VariantRecalibratorEngine - Finished iteration 35.   Current change in mixture coefficients = 0.00604
    INFO  18:07:34,423 VariantRecalibratorEngine - Finished iteration 40.   Current change in mixture coefficients = 0.00368
    INFO  18:07:34,546 VariantRecalibratorEngine - Finished iteration 45.   Current change in mixture coefficients = 0.00228
    INFO  18:07:34,596 VariantRecalibratorEngine - Convergence after 47 iterations!
    INFO  18:07:34,626 VariantRecalibratorEngine - Evaluating full set of 520227 variants...
    INFO  18:07:37,587 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493666E8    29.5 m      10.0 s       99.7%    29.6 m       4.0 s
    INFO  18:07:41,911 TrancheManager - Finding 4 tranches for 520227 variants
    INFO  18:07:42,211 TrancheManager -   Tranche threshold 100.00 => selection metric threshold 0.000
    INFO  18:07:42,280 TrancheManager -   Found tranche for 100.000: 0.000 threshold starting with variant 0; running score is 0.000
    INFO  18:07:42,281 TrancheManager -   Tranche is Tranche ts=100.00 minVQSLod=-116.4188 known=(226296 @ 2.1403) novel=(293931 @ 0.0667) truthSites(104877 accessible, 104877 called), name=anonymous]
    INFO  18:07:42,281 TrancheManager -   Tranche threshold 99.90 => selection metric threshold 0.001
    INFO  18:07:42,347 TrancheManager -   Found tranche for 99.900: 0.001 threshold starting with variant 17012; running score is 0.001
    INFO  18:07:42,348 TrancheManager -   Tranche is Tranche ts=99.90 minVQSLod=-1.9720 known=(218619 @ 2.1716) novel=(284596 @ 0.0660) truthSites(104877 accessible, 104772 called), name=anonymous]
    INFO  18:07:42,348 TrancheManager -   Tranche threshold 99.00 => selection metric threshold 0.010
    INFO  18:07:42,395 TrancheManager -   Found tranche for 99.000: 0.010 threshold starting with variant 28689; running score is 0.010
    INFO  18:07:42,395 TrancheManager -   Tranche is Tranche ts=99.00 minVQSLod=0.3904 known=(210427 @ 2.1816) novel=(281111 @ 0.0653) truthSites(104877 accessible, 103828 called), name=anonymous]
    INFO  18:07:42,395 TrancheManager -   Tranche threshold 90.00 => selection metric threshold 0.100
    INFO  18:07:42,443 TrancheManager -   Found tranche for 90.000: 0.100 threshold starting with variant 97936; running score is 0.100
    INFO  18:07:42,443 TrancheManager -   Tranche is Tranche ts=90.00 minVQSLod=6.5017 known=(182319 @ 2.2411) novel=(239972 @ 0.0641) truthSites(104877 accessible, 94389 called), name=anonymous]
    

    The VQSR is is very silent what the -AS option really does, what it needs, how to interpret this.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mmokrejs
    Hi,

    I did not ask in the other threads, but does the issue of bad plots go away if you do not use multi-threading?

    -Sheila

  • mmokrejsmmokrejs Czech RepublicMember
    edited May 2017

    I did not ask in the other threads, but does the issue of bad plots go away if you do not use multi-threading?

    Last several days I am running all tests with -nt 1 to be sure I have a chance to get a good result. Still, no good VQSR result plots.

    If I take the default approach not using -AS I get bad VQSR plots at least.

    If I use -AS only in HaplotypeCaller step and do not use it during GenotypeGVCFs (because insufficient documentation that both HaplotypeCaller and GenotypeGVCFs commands must both include -G Standard -G AS_Standard in sync; then I get again bad VQSR PDF plots out.

    If I fix my pipeline to also use -AS during GenotypeGVCFs then I hit the java crash: http://gatkforums.broadinstitute.org/gatk/discussion/9527/genotypegvcfs-usenewafcalculator-g-standard-g-as-standard-crashing-with-java-lang-nullpointerexc

    That is why I try to keep the "issues" in separate threads. I really trying several approach hoping at least one will work well. So far none.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mmokrejs
    Hi,

    It is actually best to keep the details all in one thread so the team has one place to look. Someone will get back to you here

    -Sheila

  • mmokrejsmmokrejs Czech RepublicMember

    @Sheila

    Hi,
    I will keep this details about this crash here. I wanted to prepare a testcase subset of the dataset but as you will see it depends on the number of input rows. Either I will have to give you access to the full dataset or your engineers will manage to stitch their own testcase. Or they can provide me with a debug binary? Definitely, tests for NaN or Infinite before any actual computation is attempted should be easy to implement.

    $ wc -l CR-MGUS.raw.vcf
    552463 CR-MGUS.raw.vcf
    $
    $ wc -l CR-MGUS_test2.raw.vcf
    40000 CR-MGUS_test2.raw.vcf
    $
    
    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS_test2.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R > CR-MGUS_test2.raw.log 2>&1
    
    This one actually worked.
    
    INFO  23:05:47,735 VariantDataManager - MQ:      mean = 1.03     standard deviation = 0.30
    INFO  23:05:47,749 VariantDataManager - QD:      mean = 16.94    standard deviation = 7.51
    INFO  23:05:47,757 VariantDataManager - FS:      mean = 1.65     standard deviation = 2.79
    INFO  23:05:47,762 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.20
    INFO  23:05:47,765 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.48
    INFO  23:05:47,776 VariantDataManager - ReadPosRankSum:          mean = 0.13     standard deviation = 0.68
    INFO  23:05:47,779 VariantDataManager - InbreedingCoeff:         mean = -0.00    standard deviation = 0.19
    INFO  23:05:47,817 VariantDataManager - Annotations are now ordered by their information content: [QD, SOR, FS, MQ, ReadPosRankSum, InbreedingCoeff, MQRankSum]
    INFO  23:05:47,823 VariantDataManager - Training with 10367 variants after standard deviation thresholding.
    INFO  23:05:47,827 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  23:05:48,190 VariantRecalibratorEngine - Finished iteration 0.
    INFO  23:05:48,497 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.47265
    INFO  23:05:48,724 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.19135
    ...
    INFO  23:05:53,367 VariantRecalibratorEngine - Convergence after 108 iterations! 
    INFO  23:05:53,427 VariantRecalibratorEngine - Evaluating full set of 34613 variants... 
    INFO  23:05:54,026 VariantDataManager - Selected worst 2453 scoring variants --> variants with LOD <= -5.0000. 
    INFO  23:05:54,027 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
    INFO  23:05:54,054 VariantRecalibratorEngine - Finished iteration 0. 
    INFO  23:05:54,067 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.09881 
    INFO  23:05:54,080 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.00825 
    INFO  23:05:54,095 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.01178 
    INFO  23:05:54,110 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.00618 
    INFO  23:05:54,125 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.00115 
    INFO  23:05:54,125 VariantRecalibratorEngine - Convergence after 25 iterations! 
    INFO  23:05:54,132 VariantRecalibratorEngine - Evaluating full set of 34613 variants... 
    INFO  23:05:54,765 TrancheManager - Finding 4 tranches for 34613 variants 
    INFO  23:05:54,790 TrancheManager -   Tranche threshold 100.00 => selection metric threshold 0.000 
    INFO  23:05:54,801 TrancheManager -   Found tranche for 100.000: 0.000 threshold starting with variant 0; running score is 0.000  
    INFO  23:05:54,801 TrancheManager -   Tranche is Tranche ts=100.00 minVQSLod=-33222.4174 known=(18060 @ 1.1309) novel=(16553 @ 0.0749) truthSites(4944 accessible, 4944 called), name=anonymous] 
    INFO  23:05:54,802 TrancheManager -   Tranche threshold 99.90 => selection metric threshold 0.001 
    INFO  23:05:54,809 TrancheManager -   Found tranche for 99.900: 0.001 threshold starting with variant 1291; running score is 0.001  
    INFO  23:05:54,809 TrancheManager -   Tranche is Tranche ts=99.90 minVQSLod=-3.2810 known=(17459 @ 1.1442) novel=(15863 @ 0.0745) truthSites(4944 accessible, 4939 called), name=anonymous] 
    INFO  23:05:54,809 TrancheManager -   Tranche threshold 99.00 => selection metric threshold 0.010 
    INFO  23:05:54,813 TrancheManager -   Found tranche for 99.000: 0.010 threshold starting with variant 3624; running score is 0.010  
    INFO  23:05:54,813 TrancheManager -   Tranche is Tranche ts=99.00 minVQSLod=-0.0263 known=(16401 @ 1.1519) novel=(14588 @ 0.0742) truthSites(4944 accessible, 4894 called), name=anonymous] 
    INFO  23:05:54,813 TrancheManager -   Tranche threshold 90.00 => selection metric threshold 0.100 
    INFO  23:05:54,818 TrancheManager -   Found tranche for 90.000: 0.100 threshold starting with variant 9259; running score is 0.100  
    INFO  23:05:54,818 TrancheManager -   Tranche is Tranche ts=90.00 minVQSLod=5.0231 known=(13728 @ 1.1996) novel=(11626 @ 0.0757) truthSites(4944 accessible, 4449 called), name=anonymous] 
    INFO  23:05:54,820 VariantRecalibrator - Writing out recalibration table... 
    INFO  23:05:55,344 VariantRecalibrator - Writing out visualization Rscript file... 
    INFO  23:05:55,617 VariantRecalibrator - Building QD x SOR plot... 
    INFO  23:05:55,622 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:05:57,180 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:05:57,819 VariantRecalibrator - Building QD x FS plot... 
    INFO  23:05:57,823 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:05:59,319 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:05:59,941 VariantRecalibrator - Building QD x MQ plot... 
    INFO  23:05:59,943 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:06:01,475 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:06:02,069 VariantRecalibrator - Building QD x ReadPosRankSum plot... 
    INFO  23:06:02,070 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:06:03,517 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:06:04,112 VariantRecalibrator - Building QD x InbreedingCoeff plot... 
    INFO  23:06:04,112 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  23:06:05,590 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  23:06:06,178 VariantRecalibrator - Building QD x MQRankSum plot... 
    INFO  23:06:06,179 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:06:07,648 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  23:06:08,236 VariantRecalibrator - Building SOR x FS plot... 
    INFO  23:06:08,237 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  23:06:09,815 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  23:06:10,401 VariantRecalibrator - Building SOR x MQ plot... 
    INFO  23:06:10,402 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  23:06:11,967 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  23:06:12,549 VariantRecalibrator - Building SOR x ReadPosRankSum plot... 
    
    

    But the one below crashes on full dataset:

    $ wc -l CR-MGUS.raw.vcf
    552463 CR-MGUS.raw.vcf
    $
    
    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R > CR-MGUS.raw.log 2>&1
    
    
    ...
    INFO  23:44:47,594 VariantDataManager - MQ:      mean = Infinity         standard deviation = NaN
    INFO  23:44:47,677 VariantDataManager - QD:      mean = 16.98    standard deviation = 7.37
    INFO  23:44:47,737 VariantDataManager - FS:      mean = 1.62     standard deviation = 3.57
    INFO  23:44:47,789 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18
    INFO  23:44:47,841 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.48
    INFO  23:44:47,895 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  23:44:47,950 VariantDataManager - InbreedingCoeff:         mean = 0.01     standard deviation = 0.21
    INFO  23:44:48,339 VariantDataManager - Annotations are now ordered by their information content: [QD, SOR, FS, InbreedingCoeff, ReadPosRankSum, MQRankSum, MQ]
    INFO  23:44:48,374 VariantDataManager - Training with 174414 variants after standard deviation thresholding.
    INFO  23:44:48,379 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    ##### ERROR --
    ##### ERROR stack trace
    java.lang.NullPointerException
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.GaussianMixtureModel.initializeMeansUsingKMeans(GaussianMixtureModel.java:173)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.GaussianMixtureModel.initializeRandomModel(GaussianMixtureModel.java:135)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.variationalBayesExpectationMaximization(VariantRecalibratorEngine.java:149)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:92)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:526)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:191)
            at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
            at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
            at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
            at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
            at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2017-06-29-g793830a):
    

    See https://gatkforums.broadinstitute.org/gatk/discussion/comment/38398

    $ head -n 100000 CR-MGUS.raw.vcf > CR-MGUS_test3.raw.vcf
    
    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS_test3.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R > CR-MGUS_test3.raw.log 2>&1
    
    ...
    INFO  00:35:52,523 VariantDataManager - MQ:      mean = 1.04     standard deviation = 0.27
    INFO  00:35:52,550 VariantDataManager - QD:      mean = 16.88    standard deviation = 7.33
    INFO  00:35:52,561 VariantDataManager - FS:      mean = 1.59     standard deviation = 3.11
    INFO  00:35:52,571 VariantDataManager - SOR:     mean = 1.15     standard deviation = 1.23
    INFO  00:35:52,581 VariantDataManager - MQRankSum:       mean = -0.03    standard deviation = 0.44
    INFO  00:35:52,592 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.68
    INFO  00:35:52,603 VariantDataManager - InbreedingCoeff:         mean = -0.00    standard deviation = 0.19
    INFO  00:35:52,794 VariantDataManager - Annotations are now ordered by their information content: [QD, SOR, FS, MQ, ReadPosRankSum, InbreedingCoeff, MQRankSum]
    INFO  00:35:52,808 VariantDataManager - Training with 32476 variants after standard deviation thresholding.
    INFO  00:35:52,813 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  00:35:53,930 VariantRecalibratorEngine - Finished iteration 0.
    INFO  00:35:54,996 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 1.42499
    

    Lets make the testcase dataset larger:

    $ head -n 300000 CR-MGUS.raw.vcf > CR-MGUS_test4.raw.vcf
    
    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS_test4.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R > CR-MGUS_test4.raw.log 2>&1
    
    ...
    INFO  01:17:13,700 VariantDataManager - MQ:      mean = 1.04     standard deviation = 0.28
    INFO  01:17:13,758 VariantDataManager - QD:      mean = 16.90    standard deviation = 7.28
    INFO  01:17:13,788 VariantDataManager - FS:      mean = 1.59     standard deviation = 3.55
    INFO  01:17:13,823 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18
    INFO  01:17:13,854 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.46
    INFO  01:17:13,886 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  01:17:13,919 VariantDataManager - InbreedingCoeff:         mean = -0.00    standard deviation = 0.19
    INFO  01:17:14,143 VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, SOR, FS, ReadPosRankSum, InbreedingCoeff, MQRankSum]
    INFO  01:17:14,170 VariantDataManager - Training with 102099 variants after standard deviation thresholding.
    INFO  01:17:14,175 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  01:17:22,418 VariantRecalibratorEngine - Finished iteration 0.
    INFO  01:17:26,231 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.82402
    INFO  01:17:30,167 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.27186
    INFO  01:17:33,988 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.28126
    INFO  01:17:37,789 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.07223
    INFO  01:17:41,536 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.03354
    INFO  01:17:43,067 ProgressMeter - chr19_KI270938v1_alt:195720   1.67376186E8    31.5 m      11.0 s       99.7%    31.6 m       5.0 s
    INFO  01:17:45,287 VariantRecalibratorEngine - Finished iteration 30.   Current change in mixture coefficients = 0.03178
    INFO  01:17:49,059 VariantRecalibratorEngine - Finished iteration 35.   Current change in mixture coefficients = 0.01823
    INFO  01:17:52,877 VariantRecalibratorEngine - Finished iteration 40.   Current change in mixture coefficients = 0.00823
    INFO  01:17:56,649 VariantRecalibratorEngine - Finished iteration 45.   Current change in mixture coefficients = 0.00943
    INFO  01:18:00,403 VariantRecalibratorEngine - Finished iteration 50.   Current change in mixture coefficients = 0.00542
    INFO  01:18:03,976 VariantRecalibratorEngine - Finished iteration 55.   Current change in mixture coefficients = 0.00353
    INFO  01:18:07,648 VariantRecalibratorEngine - Finished iteration 60.   Current change in mixture coefficients = 0.00653
    INFO  01:18:11,149 VariantRecalibratorEngine - Finished iteration 65.   Current change in mixture coefficients = 0.00439
    INFO  01:18:13,069 ProgressMeter - chr19_KI270938v1_alt:195720   1.67376186E8    32.0 m      11.0 s       99.7%    32.1 m       5.0 s
    INFO  01:18:14,643 VariantRecalibratorEngine - Finished iteration 70.   Current change in mixture coefficients = 0.00413
    INFO  01:18:18,142 VariantRecalibratorEngine - Finished iteration 75.   Current change in mixture coefficients = 0.00348
    INFO  01:18:21,623 VariantRecalibratorEngine - Finished iteration 80.   Current change in mixture coefficients = 0.00270
    INFO  01:18:25,258 VariantRecalibratorEngine - Finished iteration 85.   Current change in mixture coefficients = 0.00199
    INFO  01:18:25,258 VariantRecalibratorEngine - Convergence after 85 iterations!
    INFO  01:18:25,728 VariantRecalibratorEngine - Evaluating full set of 277375 variants...
    WARN  01:18:27,254 VariantRecalibratorEngine - Evaluate datum returned a NaN.
    INFO  01:18:27,284 VariantDataManager - Selected worst 5071 scoring variants --> variants with LOD <= -5.0000.
    INFO  01:18:27,284 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  01:18:27,344 VariantRecalibratorEngine - Finished iteration 0.
    INFO  01:18:27,370 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.04736
    INFO  01:18:27,397 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.09036
    INFO  01:18:27,423 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.03218
    INFO  01:18:27,450 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.02570
    INFO  01:18:27,476 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.01935
    INFO  01:18:27,502 VariantRecalibratorEngine - Finished iteration 30.   Current change in mixture coefficients = 0.01173
    INFO  01:18:27,529 VariantRecalibratorEngine - Finished iteration 35.   Current change in mixture coefficients = 0.00614
    INFO  01:18:27,555 VariantRecalibratorEngine - Finished iteration 40.   Current change in mixture coefficients = 0.00317
    INFO  01:18:27,577 VariantRecalibratorEngine - Convergence after 44 iterations!
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version nightly-2017-06-29-g793830a):
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider lowering the maximum number of Gaussians allowed for use in the model (via --maxGaussians 4, for example).
    ##### ERROR ------------------------------------------------------------------------------------------
    

    I am not a native speaker, what datum means here in the message WARN 01:18:27,254 VariantRecalibratorEngine - Evaluate datum returned a NaN.?

    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS_test4.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R --maxGaussians 4 > CR-MGUS_test4_maxGaussians4.raw.log 2>&1
    
    
    ...
    INFO  08:40:49,382 VariantDataManager - MQ:      mean = 1.04     standard deviation = 0.28
    INFO  08:40:49,440 VariantDataManager - QD:      mean = 16.90    standard deviation = 7.28
    INFO  08:40:49,470 VariantDataManager - FS:      mean = 1.59     standard deviation = 3.55
    INFO  08:40:49,503 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18
    INFO  08:40:49,534 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.46
    INFO  08:40:49,565 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  08:40:49,596 VariantDataManager - InbreedingCoeff:         mean = -0.00    standard deviation = 0.19
    INFO  08:40:49,817 VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, SOR, FS, ReadPosRankSum, InbreedingCoeff, MQRankSum]
    INFO  08:40:49,844 VariantDataManager - Training with 102099 variants after standard deviation thresholding.
    INFO  08:40:49,848 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  08:40:54,876 VariantRecalibratorEngine - Finished iteration 0.
    INFO  08:40:56,782 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.20361
    INFO  08:40:58,532 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.07214
    INFO  08:41:00,296 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.07937
    INFO  08:41:02,108 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.10672
    INFO  08:41:03,975 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.09681
    INFO  08:41:04,464 ProgressMeter - chr19_KI270938v1_alt:195720   1.67376186E8    31.5 m      11.0 s       99.7%    31.6 m       5.0 s
    INFO  08:41:06,010 VariantRecalibratorEngine - Finished iteration 30.   Current change in mixture coefficients = 0.03377
    INFO  08:41:07,873 VariantRecalibratorEngine - Finished iteration 35.   Current change in mixture coefficients = 0.00447
    INFO  08:41:09,547 VariantRecalibratorEngine - Convergence after 39 iterations!
    INFO  08:41:09,820 VariantRecalibratorEngine - Evaluating full set of 277375 variants...
    WARN  08:41:10,569 VariantRecalibratorEngine - Evaluate datum returned a NaN.
    INFO  08:41:10,601 VariantDataManager - Selected worst 7879 scoring variants --> variants with LOD <= -5.0000.
    INFO  08:41:10,601 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  08:41:10,696 VariantRecalibratorEngine - Finished iteration 0.
    INFO  08:41:10,737 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.01520
    INFO  08:41:10,779 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.03688
    INFO  08:41:10,821 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.02642
    INFO  08:41:10,867 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.00627
    INFO  08:41:10,914 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.00226
    INFO  08:41:10,923 VariantRecalibratorEngine - Convergence after 26 iterations!
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version nightly-2017-06-29-g793830a):
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider lowering the maximum number of Gaussians allowed for use in the model (via --maxGaussians 4, for example).
    ##### ERROR ------------------------------------------------------------------------------------------
    
    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS_test4.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R --maxGaussians 3 > CR-MGUS_test4_maxGaussians3.raw.log 2>&1
    
    ...
    INFO  11:42:46,844 VariantDataManager - MQ:      mean = 1.04     standard deviation = 0.28
    INFO  11:42:46,904 VariantDataManager - QD:      mean = 16.90    standard deviation = 7.28
    INFO  11:42:46,934 VariantDataManager - FS:      mean = 1.59     standard deviation = 3.55
    INFO  11:42:46,968 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18
    INFO  11:42:46,999 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.46
    INFO  11:42:47,033 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  11:42:47,066 VariantDataManager - InbreedingCoeff:         mean = -0.00    standard deviation = 0.19
    INFO  11:42:47,299 VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, SOR, FS, ReadPosRankSum, InbreedingCoeff, MQRankSum]
    INFO  11:42:47,326 VariantDataManager - Training with 102099 variants after standard deviation thresholding.
    INFO  11:42:47,331 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  11:42:52,881 VariantRecalibratorEngine - Finished iteration 0.
    INFO  11:42:54,458 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.06627
    INFO  11:42:55,972 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.02819
    INFO  11:42:57,396 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.04622
    INFO  11:42:58,776 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.00944
    INFO  11:42:59,888 VariantRecalibratorEngine - Convergence after 24 iterations!
    INFO  11:43:00,105 VariantRecalibratorEngine - Evaluating full set of 277375 variants...
    WARN  11:43:00,750 VariantRecalibratorEngine - Evaluate datum returned a NaN.
    INFO  11:43:00,780 VariantDataManager - Selected worst 12316 scoring variants --> variants with LOD <= -5.0000.
    INFO  11:43:00,781 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  11:43:00,940 VariantRecalibratorEngine - Finished iteration 0.
    INFO  11:43:01,007 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.02320
    INFO  11:43:01,086 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.01126
    INFO  11:43:01,169 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.00166
    INFO  11:43:01,170 VariantRecalibratorEngine - Convergence after 15 iterations!
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version nightly-2017-06-29-g793830a):
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider lowering the maximum number of Gaussians allowed for use in the model (via --maxGaussians 4, for example).
    ##### ERROR ------------------------------------------------------------------------------------------
    
    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS_test4.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R --maxGaussians 2 > CR-MGUS_test4_maxGaussians2.raw.log 2>&1
    
    ...
    INFO  12:26:04,746 VariantDataManager - MQ:      mean = 1.04     standard deviation = 0.28
    INFO  12:26:04,804 VariantDataManager - QD:      mean = 16.90    standard deviation = 7.28
    INFO  12:26:04,833 VariantDataManager - FS:      mean = 1.59     standard deviation = 3.55
    INFO  12:26:04,868 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18
    INFO  12:26:04,899 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.46
    INFO  12:26:04,930 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  12:26:04,960 VariantDataManager - InbreedingCoeff:         mean = -0.00    standard deviation = 0.19
    INFO  12:26:05,180 VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, SOR, FS, ReadPosRankSum, InbreedingCoeff, MQRankSum]
    INFO  12:26:05,208 VariantDataManager - Training with 102099 variants after standard deviation thresholding.
    INFO  12:26:05,212 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  12:26:09,976 VariantRecalibratorEngine - Finished iteration 0.
    INFO  12:26:11,151 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.02112
    INFO  12:26:12,720 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.01284
    INFO  12:26:13,303 VariantRecalibratorEngine - Convergence after 13 iterations!
    INFO  12:26:13,466 VariantRecalibratorEngine - Evaluating full set of 277375 variants...
    WARN  12:26:13,885 VariantRecalibratorEngine - Evaluate datum returned a NaN.
    INFO  12:26:13,916 VariantDataManager - Selected worst 12278 scoring variants --> variants with LOD <= -5.0000.
    INFO  12:26:13,916 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    INFO  12:26:14,055 VariantRecalibratorEngine - Finished iteration 0.
    INFO  12:26:14,122 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.25876
    INFO  12:26:14,187 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.02964
    INFO  12:26:14,252 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.00814
    INFO  12:26:14,317 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.00158
    INFO  12:26:14,317 VariantRecalibratorEngine - Convergence after 20 iterations!
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version nightly-2017-06-29-g793830a):
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider lowering the maximum number of Gaussians allowed for use in the model (via --maxGaussians 4, for example).
    ##### ERROR ------------------------------------------------------------------------------------------
    

    Please improve the INFO output of VariantRecalibrator to state currently used value of maxGaussians.

    If I try for the full dataset hopefully lower maxGaussians value I get:

    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R --maxGaussians 2 > CR-MGUS_maxGaussians2.raw.log 2>&1
    
    ...
    INFO  13:00:33,702 VariantDataManager - MQ:      mean = Infinity         standard deviation = NaN
    INFO  13:00:33,780 VariantDataManager - QD:      mean = 16.98    standard deviation = 7.37
    INFO  13:00:33,834 VariantDataManager - FS:      mean = 1.62     standard deviation = 3.57
    INFO  13:00:33,880 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18
    INFO  13:00:33,925 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.48
    INFO  13:00:33,973 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  13:00:34,021 VariantDataManager - InbreedingCoeff:         mean = 0.01     standard deviation = 0.21
    INFO  13:00:34,382 VariantDataManager - Annotations are now ordered by their information content: [QD, SOR, FS, InbreedingCoeff, ReadPosRankSum, MQRankSum, MQ]
    INFO  13:00:34,416 VariantDataManager - Training with 174414 variants after standard deviation thresholding.
    INFO  13:00:34,420 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    ##### ERROR --
    ##### ERROR stack trace
    java.lang.NullPointerException
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.GaussianMixtureModel.initializeMeansUsingKMeans(GaussianMixtureModel.java:173)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.GaussianMixtureModel.initializeRandomModel(GaussianMixtureModel.java:135)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.variationalBayesExpectationMaximization(VariantRecalibratorEngine.java:149)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:92)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:526)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:191)
            at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
            at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
            at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
            at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
            at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2017-06-29-g793830a):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------------------------------------------------------------------------------------
    

    Maybe I should have actually increased the maxGaussians? Do I assume right default is maxGaussians=6?

    $ java -Djavaio.tmpdir=. -Xmx64g -jar /scratch/work/project/bio/GATK/GenomeAnalysisTK-nightly-2017-06-29-g793830a/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 1 -R /scratch/work/project/bio/db/ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input CR-MGUS.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 /scratch/work/project/bio/db/ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /scratch/work/project/bio/db/ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an MQ -an QD -an FS -an SOR -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff --target_titv 3.2 --MQCapForLogitJitterTransform -MQCap 80 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile CR-MGUS.recalibrate_SNP.recal -tranchesFile CR-MGUS.recalibrate_SNP.tranches -rscriptFile CR-MGUS.recalibrate_SNP_plots.R --maxGaussians 8 > CR-MGUS_maxGaussians8.raw.log 2>&1
    
    ...
    INFO  13:38:34,642 VariantDataManager - MQ:      mean = Infinity         standard deviation = NaN
    INFO  13:38:34,724 VariantDataManager - QD:      mean = 16.98    standard deviation = 7.37
    INFO  13:38:34,784 VariantDataManager - FS:      mean = 1.62     standard deviation = 3.57
    INFO  13:38:34,837 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18
    INFO  13:38:34,888 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.48
    INFO  13:38:34,944 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.67
    INFO  13:38:34,992 VariantDataManager - InbreedingCoeff:         mean = 0.01     standard deviation = 0.21
    INFO  13:38:35,351 VariantDataManager - Annotations are now ordered by their information content: [QD, SOR, FS, InbreedingCoeff, ReadPosRankSum, MQRankSum, MQ]
    INFO  13:38:35,384 VariantDataManager - Training with 174414 variants after standard deviation thresholding.
    INFO  13:38:35,388 GaussianMixtureModel - Initializing model with 100 k-means iterations...
    ##### ERROR --
    ##### ERROR stack trace
    java.lang.NullPointerException
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.GaussianMixtureModel.initializeMeansUsingKMeans(GaussianMixtureModel.java:173)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.GaussianMixtureModel.initializeRandomModel(GaussianMixtureModel.java:135)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.variationalBayesExpectationMaximization(VariantRecalibratorEngine.java:149)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:92)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:526)
            at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:191)
            at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
            at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
            at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
            at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
            at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
            at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2017-06-29-g793830a):
    ...
    

    Issue · Github
    by shlee

    Issue Number
    2304
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator admin

    Hi @mmokrejs,

    Sheila is away at a workshop and can followup with you when she is back to Boston.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mmokrejs
    Hi Martin,

    Datum is simply the singular of data :smile:

    I think there has been some follow-up in a personal message. Let's keep that thread going. Once everything has been resolved and you can upload your test data, it will be easier for us to understand what is going on.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mmokrejs
    Hi Martin,

    Sorry for the delay. I just ran VariantRecalibrator on your test VCF, and I got no error with GATK 3.8 latest nightly build. It looks like you were using 3.7, so somewhere along the lines this got fixed :smile:

    -Sheila

  • oskarvoskarv BergenMember

    This issues seems to have creeped back into gatk 4.0.0.0 to 4.0.2.1. It works fine if I use VariantRecalibrator from gatk 3.8 though. I've attached the stderr logs from when I ran with version 4.0.2.0 and 3.8, and as you can see they calculate the same values for QD, MQ etc, but the "Current change in mixture coefficients"-values aren't the same. Don't know how significant that is, but it's at least a difference between the two tools. I can share the vcf input file with you if you'd like.

  • oskarvoskarv BergenMember
    edited March 14

    @oskarv said:
    This issues seems to have creeped back into gatk 4.0.0.0 to 4.0.2.1. It works fine if I use VariantRecalibrator from gatk 3.8 though. I've attached the stderr logs from when I ran with version 4.0.2.0 and 3.8, and as you can see they calculate the same values for QD, MQ etc, but the "Current change in mixture coefficients"-values aren't the same. Don't know how significant that is, but it's at least a difference between the two tools. I can share the vcf input file with you if you'd like.

    I tried the -L flag and it seems to work as it should, it has run through 7/17 bed files just fine so far. So something works at least. Running it like this isn't an option though, but for the sake of troubleshooting I hope it can help.
    Edit: Chromosome X made VariantRecalibrator throw the "No data found" error! Does this make any more sense now?

  • oskarvoskarv BergenMember

    @oskarv said:

    @oskarv said:
    This issues seems to have creeped back into gatk 4.0.0.0 to 4.0.2.1. It works fine if I use VariantRecalibrator from gatk 3.8 though. I've attached the stderr logs from when I ran with version 4.0.2.0 and 3.8, and as you can see they calculate the same values for QD, MQ etc, but the "Current change in mixture coefficients"-values aren't the same. Don't know how significant that is, but it's at least a difference between the two tools. I can share the vcf input file with you if you'd like.

    I tried the -L flag and it seems to work as it should, it has run through 7/17 bed files just fine so far. So something works at least. Running it like this isn't an option though, but for the sake of troubleshooting I hope it can help.
    Edit: Chromosome X made VariantRecalibrator throw the "No data found" error! Does this make any more sense now?

    I reran the pipeline and the only thing I changed was that I piped bwa to SortSam instead of piping it to samtools view, and for some reason it worked this time... :neutral:

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @oskarv
    Hi,

    I think this thread will have some helpful tips. Also, some of the other threads linked to in that thread may help.

    -Sheila

Sign In or Register to comment.