Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

NaN LOD in VQSR

dcittarodcittaro Posts: 31Member

Hi all, I'm running VariantRecalibrator on a SNP set (47 exomes) and I get this error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-3-gde33222): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --percentBadVariants 0.05, for example) or lowering the maximum number of Gaussians to use in the model (via --maxGaussians 4, for example)
##### ERROR ------------------------------------------------------------------------------------------

this is the command line:

    java -Djava.io.tmpdir=/lustre2/scratch/  -Xmx32g -jar /lustre1/tools/bin/GenomeAnalysisTK-2.2-3.jar \
    -T VariantRecalibrator \
    -R /lustre1/genomes/hg19/fa/hg19.fa \
    -input /lustre1/workspace/Ferrari/Carrera/Analysis/UG/bpd_ug.SNP.vcf \
    -resource:hapmap,VCF,known=false,training=true,truth=true,prior=15.0 /lustre1/genomes/hg19/annotation/hapmap_3.3.hg19.sites.vcf.gz \
    -resource:omni,VCF,known=false,training=true,truth=false,prior=12.0 /lustre1/genomes/hg19/annotation/1000G_omni2.5.hg19.sites.vcf.gz \
    -resource:dbsnp,VCF,known=true,training=false,truth=false,prior=6.0 /lustre1/genomes/hg19/annotation/dbSNP-137.chr.vcf -an QD \
    -an HaplotypeScore \
    -an MQRankSum \
    -an ReadPosRankSum \
    -an FS \
    -an MQ \
    -an DP \
    -an QD \
    -an InbreedingCoeff \
    -mode SNP \
    -recalFile /lustre2/scratch/Carrera/Analysis2/snp.ug.recal.csv \
    -tranchesFile /lustre2/scratch/Carrera/Analysis2/snp.ug.tranches \
    -rscriptFile /lustre2/scratch/Carrera/Analysis2/snp.ug.plot.R \
    -U ALLOW_SEQ_DICT_INCOMPATIBILITY \
    --maxGaussians 6

I've already tried to decrease the --maxGaussians option to 4, I've also added --percentBad option (setting it up to 0.12, as for INDEL) but I still get the error. I've added the option -debug to see what's happening, but apparently this has been removed from GATK-2.2. Any help is appreciated... thanks

Tagged:

Best Answer

Answers

Sign In or Register to comment.