To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

GATK 3.0+ LOD issue with VariantRecalibrator Running Large data sets .

srynearson1srynearson1 Member
edited May 2014 in Ask the GATK team

Hello all,

Quick issue. I'm currently running VariantRecalibrator using the following command.

Program Args: -T VariantRecalibrator -R /data/srynearson/gatk_reference/human_g1k_v37_decoy.fasta --minNumBadVariants 5000 --num_threads 10 -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data/GATK_Bundle/hapmap_3.3.b37.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /data/GATK_Bundle/1000G_omni2.5.b37.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /data/GATK_Bundle/1000G_phase1.snps.high_confidence.b37.vcf -an QD -an:HaplotypeScore -an:MQRankSum -an:ReadPosRankSum -an:FS -input /data2/srynearson/UGP06/fastq/UGP06_genotyped.vcf -recalFile /data2/srynearson/UGP06/fastq/UGP06_snp_recal -tranchesFile /data2/srynearson/UGP06/fastq/UGP06_snp_tranches -rscriptFile /data2/srynearson/UGP06/fastq/UGP06_snp_plots.R -mode SNP

And getting the following error:

ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --minNumBadVariants 5000, for example).

This seems strange given the "-input /data2/srynearson/UGP06/fastq/UGP06_genotyped.vcf" file above has 100 CEU individuals + one target exome.

I will note that the genotyped.vcf file does only show ~280,000 variants.

I have tried increasing the minNumBadVariants and reducing maxGaussian, but the error either stays the same or changes to "Unable to retrieve result".

Any ideas.


Best Answer


Sign In or Register to comment.