Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator raising a error when trainning the negative model question

henry_byhenry_by CHGC at ShanghaiMember

Hi, I've having an issue with VariantRecalibrator.

The procession and error message:
INFO 10:36:26,909 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:36:26,911 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
INFO 10:36:26,911 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:36:26,911 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 10:36:26,914 HelpFormatter - Program Args: -T VariantRecalibrator --maxGaussians 4 -R /data/database/UCSC_hg19/UCSC.hg19.fa -input SCMC_BLD1_150320-A097.haplo.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data/database/UCSC_hg19/hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 /data/database/UCSC_hg19/1000G_omni2.5.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /data/database/UCSC_hg19/dbsnp_137.hg19.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -recalFile A097.snp.recal -tranchesFile A097.snp.tranches -rscriptFile A097.snp.plots.R --TStranche 99.0 --minNumBadVariants 500
INFO 10:36:26,919 HelpFormatter - Executing as [email protected] on Linux 3.10.0-229.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_75-mockbuild_2015_01_21_05_53-b00.
INFO 10:36:26,919 HelpFormatter - Date/Time: 2015/07/16 10:36:26
INFO 10:36:26,920 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:36:26,920 HelpFormatter - --------------------------------------------------------------------------------
INFO 10:36:27,018 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:36:27,144 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 10:36:27,552 GenomeAnalysisEngine - Preparing for traversal
INFO 10:36:27,569 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:36:27,570 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:36:27,570 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:36:27,570 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 10:36:27,575 TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
INFO 10:36:27,575 TrainingSet - Found omni track: Known = false Training = true Truth = false Prior = Q12.0
INFO 10:36:27,576 TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q6.0
INFO 10:36:57,573 ProgressMeter - chr2:7347679 4520793.0 30.0 s 6.0 s 8.3% 6.0 m 5.5 m
INFO 10:37:27,574 ProgressMeter - chr3:15705751 9328479.0 60.0 s 6.0 s 16.4% 6.1 m 5.1 m
INFO 10:37:57,575 ProgressMeter - chr4:63376726 1.4122115E7 90.0 s 6.0 s 24.4% 6.2 m 4.7 m
INFO 10:38:27,577 ProgressMeter - chr5:125502904 1.9015319E7 120.0 s 6.0 s 32.5% 6.1 m 4.1 m
INFO 10:38:57,578 ProgressMeter - chr7:10003791 2.3820372E7 2.5 m 6.0 s 40.2% 6.2 m 3.7 m
INFO 10:39:27,579 ProgressMeter - chr8:90318590 2.8612028E7 3.0 m 6.0 s 47.9% 6.3 m 3.3 m
INFO 10:39:57,580 ProgressMeter - chr10:66999784 3.3378546E7 3.5 m 6.0 s 56.4% 6.2 m 2.7 m
INFO 10:40:27,581 ProgressMeter - chr12:38343658 3.8167805E7 4.0 m 6.0 s 64.3% 6.2 m 2.2 m
INFO 10:40:57,582 ProgressMeter - chr14:67173067 4.29113E7 4.5 m 6.0 s 73.2% 6.1 m 98.0 s
INFO 10:41:27,583 ProgressMeter - chr17:30993982 4.7686233E7 5.0 m 6.0 s 81.8% 6.1 m 66.0 s
INFO 10:41:57,585 ProgressMeter - chr20:51998856 5.2508187E7 5.5 m 6.0 s 89.5% 6.1 m 38.0 s
INFO 10:42:22,551 VariantDataManager - QD: mean = 18.81 standard deviation = 9.07
INFO 10:42:22,552 VariantDataManager - MQ: mean = 58.60 standard deviation = 1.22
INFO 10:42:22,552 VariantDataManager - MQRankSum: mean = 0.63 standard deviation = 1.33
INFO 10:42:22,553 VariantDataManager - ReadPosRankSum: mean = 0.86 standard deviation = 1.42
INFO 10:42:22,554 VariantDataManager - FS: mean = 0.98 standard deviation = 1.82
INFO 10:42:22,555 VariantDataManager - SOR: mean = 0.79 standard deviation = 0.30
INFO 10:42:22,563 VariantDataManager - Annotations are now ordered by their information content: [MQ, QD, MQRankSum, ReadPosRankSum, FS, SOR]
INFO 10:42:22,563 VariantDataManager - Training with 560 variants after standard deviation thresholding.
INFO 10:42:22,568 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 10:42:22,686 VariantRecalibratorEngine - Finished iteration 0.
INFO 10:42:22,727 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.11780
INFO 10:42:22,740 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.05838
INFO 10:42:22,751 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.00880
INFO 10:42:22,763 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.00228
INFO 10:42:22,765 VariantRecalibratorEngine - Convergence after 21 iterations!
INFO 10:42:22,774 VariantRecalibratorEngine - Evaluating full set of 757 variants...
INFO 10:42:22,803 VariantDataManager - Training with worst 59 scoring variants --> variants with LOD <= -5.0000.
INFO 10:42:22,803 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 10:42:22,804 VariantRecalibratorEngine - Finished iteration 0.
INFO 10:42:22,805 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.05314
INFO 10:42:22,806 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.00021
INFO 10:42:22,807 VariantRecalibratorEngine - Convergence after 10 iterations!
INFO 10:42:22,807 VariantRecalibratorEngine - Evaluating full set of 757 variants...
INFO 10:42:27,586 ProgressMeter - chrY:59358202 5.6537569E7 6.0 m 6.0 s 100.0% 6.0 m 0.0 s

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.4-0-g7e26428):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --minNumBadVariants 5000, for example).
ERROR -----------------------------------------------------------------------------------------

And my command:
java -Xmx4g -Djava.io.tmpdir=../tmp -jar /data/tools/GATK-3.4-0/GenomeAnalysisTK.jar -T VariantRecalibrator --maxGaussians 4 -R /data/database/UCSC_hg19/UCSC.hg19.fa -input SCMC_BLD1_150320-A097.haplo.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data/database/UCSC_hg19/hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 /data/database/UCSC_hg19/1000G_omni2.5.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /data/database/UCSC_hg19/dbsnp_137.hg19.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -recalFile A097.snp.recal -tranchesFile A097.snp.tranches -rscriptFile A097.snp.plots.R --TStranche 99.0
However, when I added --minNumBadVariants XXX(1000, 5000, 8000), the error message is still on.
Can you tell me how to deal this error.
I used target gene sequencing (1M), Is this because too little variants in my haplo.vcf? The same command is working in 12M target sequencing.

Best Answer

Answers

Sign In or Register to comment.