Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

VariantRecalibrator: ERROR MESSAGE: NaN LOD value assigned.

shubhamsainishubhamsaini UCSDMember
edited April 2017 in Ask the GATK team

I have some gVCF files, and I need to call variants from them. I am able to use HaplotypeCaller successfully, but VariantRecalibrator is giving me error.

java -jar /storage/s1saini/GenomeAnalysisTK.jar -T GenotypeGVCFs -V SSC00003.g.vcf.gz -V SSC00004.g.vcf.gz -V SSC00005.g.vcf.gz -V SSC00006.g.vcf.gz -V SSC01958.g.vcf.gz -V SSC01964.g.vcf.gz -V SSC01965.g.vcf.gz -V SSC01966.g.vcf.gz -V SSC02852.g.vcf.gz -V SSC02854.g.vcf.gz -V SSC02857.g.vcf.gz -V SSC02858.g.vcf.gz -V SSC03070.g.vcf.gz -V SSC03078.g.vcf.gz -V SSC03092.g.vcf.gz -V SSC03093.g.vcf.gz -o jointcalls.vcf -R ref/human_g1k_b37_20.fasta -L 20 -nt 4
INFO  10:24:23,052 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  10:24:23,055 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 
INFO  10:24:23,055 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  10:24:23,055 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  10:24:23,055 HelpFormatter - [Tue Apr 04 10:24:23 PDT 2017] Executing on Linux 3.10.0-514.2.2.el7.x86_64 amd64 
INFO  10:24:23,055 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_111-b15 
INFO  10:24:23,059 HelpFormatter - Program Args: -T GenotypeGVCFs -V SSC00003.g.vcf.gz -V SSC00004.g.vcf.gz -V SSC00005.g.vcf.gz -V SSC00006.g.vcf.gz -V SSC01958.g.vcf.gz -V SSC01964.g.vcf.gz -V SSC01965.g.vcf.gz -V SSC01966.g.vcf.gz -V SSC02852.g.vcf.gz -V SSC02854.g.vcf.gz -V SSC02857.g.vcf.gz -V SSC02858.g.vcf.gz -V SSC03070.g.vcf.gz -V SSC03078.g.vcf.gz -V SSC03092.g.vcf.gz -V SSC03093.g.vcf.gz -o jointcalls.vcf -R ref/human_g1k_b37_20.fasta -L 20 -nt 4 
INFO  10:24:23,063 HelpFormatter - Executing as [email protected] on Linux 3.10.0-514.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15. 
INFO  10:24:23,064 HelpFormatter - Date/Time: 2017/04/04 10:24:23 
INFO  10:24:23,064 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  10:24:23,064 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  10:24:23,114 GenomeAnalysisEngine - Strictness is SILENT 
INFO  10:24:23,303 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  10:24:25,741 IntervalUtils - Processing 63025520 bp from intervals 
WARN  10:24:25,741 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,742 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,742 IndexDictionaryUtils - Track variant3 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,742 IndexDictionaryUtils - Track variant4 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,742 IndexDictionaryUtils - Track variant5 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,743 IndexDictionaryUtils - Track variant6 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,743 IndexDictionaryUtils - Track variant7 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,743 IndexDictionaryUtils - Track variant8 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,744 IndexDictionaryUtils - Track variant9 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,744 IndexDictionaryUtils - Track variant10 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,744 IndexDictionaryUtils - Track variant11 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,744 IndexDictionaryUtils - Track variant12 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,744 IndexDictionaryUtils - Track variant13 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,745 IndexDictionaryUtils - Track variant14 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,745 IndexDictionaryUtils - Track variant15 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,745 IndexDictionaryUtils - Track variant16 doesn't have a sequence dictionary built in, skipping dictionary validation 
INFO  10:24:25,753 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 1 CPU thread(s) for each of 4 data thread(s), of 28 processors available on this machine 
INFO  10:24:25,809 GenomeAnalysisEngine - Preparing for traversal 
INFO  10:24:25,810 GenomeAnalysisEngine - Done preparing for traversal 
INFO  10:24:25,811 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  10:24:25,811 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  10:24:25,811 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
WARN  10:24:26,003 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
WARN  10:24:26,004 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
INFO  10:24:26,005 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files 
WARN  10:24:28,595 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs 
WARN  10:24:31,245 ExactAFCalculator - This tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 20: 83250 has 10 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument. Unless the DEBUG logging level is used, this warning message is output just once per run and further warnings are suppressed. 

Message from [email protected] at Apr  4 10:24:45 ...
 kernel:do_IRQ: 8.228 No irq handler for vector (irq -1)
INFO  10:24:56,003 ProgressMeter -      20:3126601         0.0    30.0 s      49.9 w        5.0%    10.1 m       9.6 m 
INFO  10:25:26,005 ProgressMeter -      20:3535701         0.0    60.0 s      99.5 w        5.6%    17.8 m      16.8 m 
INFO  10:25:56,006 ProgressMeter -      20:6041401   3000000.0    90.0 s      30.0 s        9.6%    15.6 m      14.1 m 
INFO  10:26:26,008 ProgressMeter -      20:7496301   4000000.0   120.0 s      30.0 s       11.9%    16.8 m      14.8 m 
INFO  10:26:56,010 ProgressMeter -     20:11018501   8000000.0     2.5 m      18.0 s       17.5%    14.3 m      11.8 m 
INFO  10:27:26,011 ProgressMeter -     20:11547201   8000000.0     3.0 m      22.0 s       18.3%    16.4 m      13.4 m 
INFO  10:27:56,012 ProgressMeter -     20:15076001       1.2E7     3.5 m      17.0 s       23.9%    14.6 m      11.1 m 

Message from [email protected] at Apr  4 10:28:14 ...
 kernel:do_IRQ: 3.86 No irq handler for vector (irq -1)
INFO  10:28:26,013 ProgressMeter -     20:15629601       1.2E7     4.0 m      20.0 s       24.8%    16.1 m      12.1 m 
INFO  10:28:56,014 ProgressMeter -     20:19188001       1.6E7     4.5 m      16.0 s       30.4%    14.8 m      10.3 m 
INFO  10:29:26,015 ProgressMeter -     20:19745601       1.6E7     5.0 m      18.0 s       31.3%    16.0 m      11.0 m 
INFO  10:29:56,017 ProgressMeter -     20:23238001       2.0E7     5.5 m      16.0 s       36.9%    14.9 m       9.4 m 
INFO  10:30:26,018 ProgressMeter -     20:23764301       2.0E7     6.0 m      18.0 s       37.7%    15.9 m       9.9 m 
INFO  10:30:56,019 ProgressMeter -     20:29293301       2.6E7     6.5 m      15.0 s       46.5%    14.0 m       7.5 m 
INFO  10:31:26,020 ProgressMeter -     20:31020501       2.8E7     7.0 m      15.0 s       49.2%    14.2 m       7.2 m 
INFO  10:31:56,021 ProgressMeter -     20:33371001       3.0E7     7.5 m      15.0 s       52.9%    14.2 m       6.7 m 
INFO  10:32:26,022 ProgressMeter -     20:34325401       3.2E7     8.0 m      15.0 s       54.5%    14.7 m       6.7 m 
INFO  10:32:56,024 ProgressMeter -     20:37383101       3.4E7     8.5 m      15.0 s       59.3%    14.3 m       5.8 m 
INFO  10:33:26,025 ProgressMeter -     20:39016401       3.6E7     9.0 m      15.0 s       61.9%    14.5 m       5.5 m 
INFO  10:33:56,026 ProgressMeter -     20:41453001       3.8E7     9.5 m      15.0 s       65.8%    14.4 m       4.9 m 
INFO  10:34:26,027 ProgressMeter -     20:45001701       4.2E7    10.0 m      14.0 s       71.4%    14.0 m       4.0 m 
INFO  10:34:56,029 ProgressMeter -     20:46006401       4.3E7    10.5 m      14.0 s       73.0%    14.4 m       3.9 m 
INFO  10:35:26,030 ProgressMeter -     20:49063101       4.6E7    11.0 m      14.0 s       77.8%    14.1 m       3.1 m 
INFO  10:35:56,031 ProgressMeter -     20:50020001       4.7E7    11.5 m      14.0 s       79.4%    14.5 m       3.0 m 
INFO  10:36:26,032 ProgressMeter -     20:53090001       5.0E7    12.0 m      14.0 s       84.2%    14.2 m       2.2 m 
INFO  10:36:56,033 ProgressMeter -     20:54019201       5.1E7    12.5 m      14.0 s       85.7%    14.6 m       2.1 m 
INFO  10:37:26,034 ProgressMeter -     20:57112001       5.4E7    13.0 m      14.0 s       90.6%    14.3 m      80.0 s 
INFO  10:37:56,036 ProgressMeter -     20:58083301       5.5E7    13.5 m      14.0 s       92.2%    14.6 m      68.0 s 
INFO  10:38:26,037 ProgressMeter -     20:61190101       5.8E7    14.0 m      14.0 s       97.1%    14.4 m      25.0 s 
INFO  10:38:56,038 ProgressMeter -     20:62134501       5.9E7    14.5 m      14.0 s       98.6%    14.7 m      12.0 s 
INFO  10:39:26,039 ProgressMeter -     20:63025501   6.202552E7    15.0 m      14.0 s      100.0%    15.0 m       0.0 s 
INFO  10:39:49,084 ProgressMeter -            done   6.302552E7    15.4 m      14.0 s      100.0%    15.4 m       0.0 s 
INFO  10:39:49,084 ProgressMeter - Total runtime 923.27 secs, 15.39 min, 0.26 hours 
------------------------------------------------------------------------------------------
Done. There were 20 WARN messages, the first 10 are repeated below.
WARN  10:24:25,741 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,742 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,742 IndexDictionaryUtils - Track variant3 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,742 IndexDictionaryUtils - Track variant4 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,742 IndexDictionaryUtils - Track variant5 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,743 IndexDictionaryUtils - Track variant6 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,743 IndexDictionaryUtils - Track variant7 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,743 IndexDictionaryUtils - Track variant8 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,744 IndexDictionaryUtils - Track variant9 doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:24:25,744 IndexDictionaryUtils - Track variant10 doesn't have a sequence dictionary built in, skipping dictionary validation 
java -jar /storage/s1saini/GenomeAnalysisTK.jar -T VariantRecalibrator -R ref/human_g1k_b37_20.fasta -input jointcalls.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf.gz -an DP -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R
INFO  10:40:30,682 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  10:40:30,684 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 
INFO  10:40:30,685 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  10:40:30,685 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  10:40:30,685 HelpFormatter - [Tue Apr 04 10:40:30 PDT 2017] Executing on Linux 3.10.0-514.2.2.el7.x86_64 amd64 
INFO  10:40:30,685 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_111-b15 
INFO  10:40:30,689 HelpFormatter - Program Args: -T VariantRecalibrator -R ref/human_g1k_b37_20.fasta -input jointcalls.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf.gz -an DP -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R 
INFO  10:40:30,693 HelpFormatter - Executing as [email protected] on Linux 3.10.0-514.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15. 
INFO  10:40:30,694 HelpFormatter - Date/Time: 2017/04/04 10:40:30 
INFO  10:40:30,694 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  10:40:30,694 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  10:40:30,718 GenomeAnalysisEngine - Strictness is SILENT 
INFO  10:40:30,808 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
WARN  10:40:31,044 IndexDictionaryUtils - Track hapmap doesn't have a sequence dictionary built in, skipping dictionary validation 
INFO  10:40:31,165 GenomeAnalysisEngine - Preparing for traversal 
INFO  10:40:31,166 GenomeAnalysisEngine - Done preparing for traversal 
INFO  10:40:31,167 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  10:40:31,167 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  10:40:31,167 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
INFO  10:40:31,172 TrainingSet - Found hapmap track:    Known = false   Training = true     Truth = true    Prior = Q15.0 
INFO  10:40:35,327 VariantDataManager - DP:      mean = 535.96   standard deviation = 65.37 
INFO  10:40:35,483 VariantDataManager - Annotations are now ordered by their information content: [DP] 
INFO  10:40:35,498 VariantDataManager - Training with 61633 variants after standard deviation thresholding. 
INFO  10:40:35,502 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
INFO  10:40:36,992 VariantRecalibratorEngine - Finished iteration 0. 
INFO  10:40:37,902 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.08556 
INFO  10:40:40,563 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.04317 
INFO  10:40:42,340 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.02471 
INFO  10:40:43,232 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.01472 
INFO  10:40:43,805 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.01129 
INFO  10:40:44,384 VariantRecalibratorEngine - Finished iteration 30.   Current change in mixture coefficients = 0.01005 
INFO  10:40:44,965 VariantRecalibratorEngine - Finished iteration 35.   Current change in mixture coefficients = 0.00837 
INFO  10:40:45,538 VariantRecalibratorEngine - Finished iteration 40.   Current change in mixture coefficients = 0.00690 
INFO  10:40:46,119 VariantRecalibratorEngine - Finished iteration 45.   Current change in mixture coefficients = 0.00585 
INFO  10:40:46,703 VariantRecalibratorEngine - Finished iteration 50.   Current change in mixture coefficients = 0.00541 
INFO  10:40:47,286 VariantRecalibratorEngine - Finished iteration 55.   Current change in mixture coefficients = 0.00555 
INFO  10:40:47,866 VariantRecalibratorEngine - Finished iteration 60.   Current change in mixture coefficients = 0.00570 
INFO  10:40:48,460 VariantRecalibratorEngine - Finished iteration 65.   Current change in mixture coefficients = 0.00588 
INFO  10:40:49,048 VariantRecalibratorEngine - Finished iteration 70.   Current change in mixture coefficients = 0.00611 
INFO  10:40:49,640 VariantRecalibratorEngine - Finished iteration 75.   Current change in mixture coefficients = 0.00634 
INFO  10:40:50,456 VariantRecalibratorEngine - Finished iteration 80.   Current change in mixture coefficients = 0.00651 
INFO  10:40:51,053 VariantRecalibratorEngine - Finished iteration 85.   Current change in mixture coefficients = 0.00651 
INFO  10:40:51,651 VariantRecalibratorEngine - Finished iteration 90.   Current change in mixture coefficients = 0.00626 
INFO  10:40:52,249 VariantRecalibratorEngine - Finished iteration 95.   Current change in mixture coefficients = 0.00575 
INFO  10:40:52,841 VariantRecalibratorEngine - Finished iteration 100.  Current change in mixture coefficients = 0.00508 
INFO  10:40:53,434 VariantRecalibratorEngine - Finished iteration 105.  Current change in mixture coefficients = 0.00436 
INFO  10:40:54,050 VariantRecalibratorEngine - Finished iteration 110.  Current change in mixture coefficients = 0.00368 
INFO  10:40:54,668 VariantRecalibratorEngine - Finished iteration 115.  Current change in mixture coefficients = 0.00308 
INFO  10:40:55,282 VariantRecalibratorEngine - Finished iteration 120.  Current change in mixture coefficients = 0.00257 
INFO  10:40:55,905 VariantRecalibratorEngine - Finished iteration 125.  Current change in mixture coefficients = 0.00213 
INFO  10:40:56,161 VariantRecalibratorEngine - Convergence after 127 iterations! 
INFO  10:40:56,243 VariantRecalibratorEngine - Evaluating full set of 188987 variants... 
INFO  10:40:56,546 VariantDataManager - Training with worst 856 scoring variants --> variants with LOD <= -5.0000. 
INFO  10:40:56,546 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
INFO  10:40:56,551 VariantRecalibratorEngine - Finished iteration 0. 
INFO  10:40:56,554 VariantRecalibratorEngine - Convergence after 3 iterations! 
INFO  10:40:56,563 VariantRecalibratorEngine - Evaluating full set of 188987 variants... 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --minNumBadVariants 5000, for example).
##### ERROR ------------------------------------------------------------------------------------------

I don't believe this is because of small dataset. I am working with 16 samples, on Chromosome 20.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @shubhamsaini
    Hi,

    I think this is indeed due to a small dataset. We recommend using at least 30 whole exome samples or one whole genome in VQSR. Do you have data from the other chromosomes? If not, you should try hard filtering.

    Thanks,
    Sheila

Sign In or Register to comment.