GATK 3.0+ LOD issue with VariantRecalibrator Running Large data sets .

srynearson1srynearson1 Member
edited May 2014 in Ask the GATK team

Hello all,

Quick issue. I'm currently running VariantRecalibrator using the following command.

Program Args: -T VariantRecalibrator -R /data/srynearson/gatk_reference/human_g1k_v37_decoy.fasta --minNumBadVariants 5000 --num_threads 10 -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data/GATK_Bundle/hapmap_3.3.b37.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /data/GATK_Bundle/1000G_omni2.5.b37.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /data/GATK_Bundle/1000G_phase1.snps.high_confidence.b37.vcf -an QD -an:HaplotypeScore -an:MQRankSum -an:ReadPosRankSum -an:FS -input /data2/srynearson/UGP06/fastq/UGP06_genotyped.vcf -recalFile /data2/srynearson/UGP06/fastq/UGP06_snp_recal -tranchesFile /data2/srynearson/UGP06/fastq/UGP06_snp_tranches -rscriptFile /data2/srynearson/UGP06/fastq/UGP06_snp_plots.R -mode SNP

And getting the following error:

ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --minNumBadVariants 5000, for example).

This seems strange given the "-input /data2/srynearson/UGP06/fastq/UGP06_genotyped.vcf" file above has 100 CEU individuals + one target exome.

I will note that the genotyped.vcf file does only show ~280,000 variants.

I have tried increasing the minNumBadVariants and reducing maxGaussian, but the error either stays the same or changes to "Unable to retrieve result".

Any ideas.


Best Answer


Sign In or Register to comment.