We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

NaN LOD in VQSR

Hi all, I'm running VariantRecalibrator on a SNP set (47 exomes) and I get this error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-3-gde33222): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --percentBadVariants 0.05, for example) or lowering the maximum number of Gaussians to use in the model (via --maxGaussians 4, for example)
##### ERROR ------------------------------------------------------------------------------------------

this is the command line:

    java -Djava.io.tmpdir=/lustre2/scratch/  -Xmx32g -jar /lustre1/tools/bin/GenomeAnalysisTK-2.2-3.jar \
    -T VariantRecalibrator \
    -R /lustre1/genomes/hg19/fa/hg19.fa \
    -input /lustre1/workspace/Ferrari/Carrera/Analysis/UG/bpd_ug.SNP.vcf \
    -resource:hapmap,VCF,known=false,training=true,truth=true,prior=15.0 /lustre1/genomes/hg19/annotation/hapmap_3.3.hg19.sites.vcf.gz \
    -resource:omni,VCF,known=false,training=true,truth=false,prior=12.0 /lustre1/genomes/hg19/annotation/1000G_omni2.5.hg19.sites.vcf.gz \
    -resource:dbsnp,VCF,known=true,training=false,truth=false,prior=6.0 /lustre1/genomes/hg19/annotation/dbSNP-137.chr.vcf -an QD \
    -an HaplotypeScore \
    -an MQRankSum \
    -an ReadPosRankSum \
    -an FS \
    -an MQ \
    -an DP \
    -an QD \
    -an InbreedingCoeff \
    -mode SNP \
    -recalFile /lustre2/scratch/Carrera/Analysis2/snp.ug.recal.csv \
    -tranchesFile /lustre2/scratch/Carrera/Analysis2/snp.ug.tranches \
    -rscriptFile /lustre2/scratch/Carrera/Analysis2/snp.ug.plot.R \
    -U ALLOW_SEQ_DICT_INCOMPATIBILITY \
    --maxGaussians 6

I've already tried to decrease the --maxGaussians option to 4, I've also added --percentBad option (setting it up to 0.12, as for INDEL) but I still get the error.
I've added the option -debug to see what's happening, but apparently this has been removed from GATK-2.2.
Any help is appreciated...
thanks

Tagged:

Best Answer

Answers

Sign In or Register to comment.