The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.

Indels Recalibration error message

DenverMember

I am trying to recalibrate my VCF files for Indels calling using the below command lines:

java -Xmx2G -jar ../GenomeAnalysisTK.jar -T VariantRecalibrator \

-R ../GATK_ref/hg19.fasta \
-input ./Variants/gcat_set_053_2.raw.snps.indels.vcf \
-nt 4 \
-resource:mills,known=false,training=true,truth=true,prior=12.0 ../GATK_ref/Mills_and_1000G_gold_standard.indels.hg19.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ../GATK_ref/dbsnp_137.hg19.vcf \
-an DP -an FS -an ReadPosRankSum -an MQRankSum \
--maxGaussians 4 \
-mode INDEL \
-recalFile ./Variants/VQSR/gcat_set_053_2.indels.vcf.recal \
-tranchesFile ./Variants/VQSR/gcat_set_053_2.indels.tranches \
-rscriptFile ./Variants/VQSR/gcat_set_053_2.indels.recal.plots.R > ./Variants/VQSR/IndelRecal2-noAnnot.log

I got this error message, even after taking the recommendation (e.g. maxGaussians 4, --percentBad 0.05). What does this error message mean? my files have too few variants? It's exome-seq.

##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --percentBadVariants 0.05, for example) or lowering the maximum number of Gaussians to use in the model (via --maxGaussians 4, for example)

Tagged:

Your dataset may simply be too small to use VQSR. How many samples are you analyzing?

• Member

I have more or less the same problem: 88 exomes, using v3.1-1 VariantRecalibrator mode INDEL

INFO ... VariantDataManager - Training with 5808 variants after standard deviation thresholding

WARN ... VariantDataManager - WARNING: Training with very few variant sites!

INFO ... VariantRecalibratorEngine - Evaluating full set of 18731 variants ...

INFO ... VariantDataManager - Training with worst 312 scoring variants --> variants with LOD <= -5.000

ERROR MESSAGE: NaN LOD value assigned ... consider raising the number of variants used to train the negative model (via --minNumBadVariants 5000, for example)

I inserted --minNumBadVariants 5000 into my command line, then tried 6000, then tried 7000; the training numbers (5808 and 312 seen above) changed only slightly, and (not surprisingly) I keep getting that error message. If I have to resort to hard-filtering, where can I find the parameters to use? Thanks.

• Member

Thanks, I should've found it on my own.

Anyway, because -minNumBadVariants wasn't doing anything, I dropped it from the command line, and tried -mNG 4 (btw I was already using --maxGaussians 4). I got no error messages! No error messages either with -mNG 3 (the default is 2). Are the results safe to use? If so, is mNG 3 "better" than 4 because it's closer to the default value of 2? Or maybe it doesn't matter when training with only 312 bad variants?