We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

VariantRecalibrator reports not finding annotations that are present

I'm trying to do VQSR on a set of ~33,000,000 variants (some "LowQual") generated by UnifiedGenotyper from 8 gibbon samples mapped to the nomLeu1 genome. As a training/truth set, I have another vcf of ~500,000 variants found in Sanger sequences by colleagues involved in the gibbon genome project. This set does not have all the same annotations, but it does have some, including MQ and DP. I have verified that at least the first 10 or so variants in the training set are present in the larger set, and most of them are not marked LowQual. Here is the command I am using:

java -Xmx12g -jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R nomLeu1.fa \
-input NLEproj.varsOnly.addRefQ.ancesState.vcf \
-resource:Sanger,known=false,training=true,truth=true,prior=15.0 Gibbon.SNP.Qual50.reordered.vcf \
-an MQ -an DP -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum \
-mode SNP \
-recalFile vqsr.recal \
-tranchesFile vqsr.tranches \
-rscriptFile vqsr.plots.R

Here is the output. It reports "Values for MQ annotation not detected for ANY training variant in the input callset." even though "MQ" annotations are present in both files. I tried removing "-an MQ" from the command, and it reported the same error the the DP annotations, which are also present. Can you help me find the problem?

INFO 11:04:25,276 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:04:25,279 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14
INFO 11:04:25,279 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:04:25,279 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:04:25,283 HelpFormatter - Program Args: -T VariantRecalibrator -R /data/August/GibbonData/GibbonGenome/nomLeu1.fa -input test.vcf -resource:Sanger,known=false,training=true,t
ruth=true,prior=15.0 Gibbon.SNP.Qual50.reordered.vcf -an MQ -an DP -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -mode SNP -recalFile vqsr.recal -tranchesFile vqsr.tranch
es -rscriptFile vqsr.plots.R
INFO 11:04:25,283 HelpFormatter - Date/Time: 2013/02/25 11:04:25
INFO 11:04:25,284 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:04:25,284 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:04:25,298 ArgumentTypeDescriptor - Dynamically determined type of test.vcf to be VCF
INFO 11:04:25,302 ArgumentTypeDescriptor - Dynamically determined type of Gibbon.SNP.Qual50.reordered.vcf to be VCF
INFO 11:04:25,315 GenomeAnalysisEngine - Strictness is SILENT
INFO 11:04:26,363 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000, Using the new downsampling implementation
INFO 11:04:26,380 RMDTrackBuilder - Loading Tribble index from disk for file test.vcf
INFO 11:04:26,672 RMDTrackBuilder - Loading Tribble index from disk for file Gibbon.SNP.Qual50.reordered.vcf
INFO 11:04:27,142 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 11:04:27,142 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 11:04:27,227 TrainingSet - Found Sanger track: Known = false Training = true Truth = true Prior = Q15.0
INFO 11:04:44,871 VariantDataManager - MQ: mean = NaN standard deviation = NaN
INFO 11:04:46,094 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Bad input: Values for MQ annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://

gatkforums.broadinstitute.org/discussion/49/using-variant-annotator

ERROR ------------------------------------------------------------------------------------------

Answers

  • I think we found the problem. The training file vcf has a sample column with no genotypes, only PL values. I removed the sample column and the "FORMAT" column so that it has only 8 columns, similar to your hapmap_3.3.b37.sites.vcf file, and it seems to be working. Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Great to hear it's solved! Thanks for reporting your solution. Perhaps that will help the other user with a similar problem.

Sign In or Register to comment.