Indels Recalibration error message

DenverMember

I am trying to recalibrate my VCF files for Indels calling using the below command lines:

java -Xmx2G -jar ../GenomeAnalysisTK.jar -T VariantRecalibrator \

-R ../GATK_ref/hg19.fasta \
-input ./Variants/gcat_set_053_2.raw.snps.indels.vcf \
-nt 4 \
-resource:mills,known=false,training=true,truth=true,prior=12.0 ../GATK_ref/Mills_and_1000G_gold_standard.indels.hg19.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ../GATK_ref/dbsnp_137.hg19.vcf \
-an DP -an FS -an ReadPosRankSum -an MQRankSum \
--maxGaussians 4 \
-mode INDEL \
-recalFile ./Variants/VQSR/gcat_set_053_2.indels.vcf.recal \
-tranchesFile ./Variants/VQSR/gcat_set_053_2.indels.tranches \
-rscriptFile ./Variants/VQSR/gcat_set_053_2.indels.recal.plots.R > ./Variants/VQSR/IndelRecal2-noAnnot.log

I got this error message, even after taking the recommendation (e.g. maxGaussians 4, --percentBad 0.05). What does this error message mean? my files have too few variants? It's exome-seq.

##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --percentBadVariants 0.05, for example) or lowering the maximum number of Gaussians to use in the model (via --maxGaussians 4, for example)

Your dataset may simply be too small to use VQSR. How many samples are you analyzing?

I have more or less the same problem: 88 exomes, using v3.1-1 VariantRecalibrator mode INDEL

INFO ... VariantDataManager - Training with 5808 variants after standard deviation thresholding

WARN ... VariantDataManager - WARNING: Training with very few variant sites!

INFO ... VariantRecalibratorEngine - Evaluating full set of 18731 variants ...

INFO ... VariantDataManager - Training with worst 312 scoring variants --> variants with LOD <= -5.000

ERROR MESSAGE: NaN LOD value assigned ... consider raising the number of variants used to train the negative model (via --minNumBadVariants 5000, for example)

I inserted --minNumBadVariants 5000 into my command line, then tried 6000, then tried 7000; the training numbers (5808 and 312 seen above) changed only slightly, and (not surprisingly) I keep getting that error message. If I have to resort to hard-filtering, where can I find the parameters to use? Thanks.

Thanks, I should've found it on my own.

Anyway, because -minNumBadVariants wasn't doing anything, I dropped it from the command line, and tried -mNG 4 (btw I was already using --maxGaussians 4). I got no error messages! No error messages either with -mNG 3 (the default is 2). Are the results safe to use? If so, is mNG 3 "better" than 4 because it's closer to the default value of 2? Or maybe it doesn't matter when training with only 312 bad variants?