Values for QD annotation not detected for ANY training variant in the input callset

sbourgeoissbourgeois London, UKMember

Hi,

I looked at previous answers to this problem, but it doesn't seem to explain what is happening.

I used haplotype caller on 45 exomes (output g.vcf files), then I did a joint genotyping using -T GenotypeGVCFs, as per the Best Practice.
The problem arises when I try to run VSQR on the resulting vcf file:

Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
INFO 12:56:43,821 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:56:43,824 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
INFO 12:56:43,824 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 12:56:43,824 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
INFO 12:56:43,824 HelpFormatter - [Sun Jul 31 12:56:43 UTC 2016] Executing on Linux 3.13.0-65-generic amd64
INFO 12:56:43,824 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_101-b13 JdkDeflater
INFO 12:56:43,827 HelpFormatter - Program Args: -T VariantRecalibrator -nt 2 -R /mnt/volume/GATK_resources/hg19/ucsc.hg19.fasta -input /mnt/volume/hhx037/GMI/joint_genotype/greekMI45.vcf -mode SNP -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /mnt/volume/GATK_resources/hg19/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /mnt/volume/GATK_resources/hg19/1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /mnt/volume/GATK_resources/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /mnt/volume/GATK_resources/hg19/dbsnp_138.hg19.vcf -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile ./filtered/recalibrate_SNP.recal -tranchesFile ./filtered/recalibrate_SNP.tranches -rscriptFile ./filtered/recalibrate_SNP_plots.R
INFO 12:56:43,831 HelpFormatter - Executing as stephaneb@stavroula on Linux 3.13.0-65-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_101-b13.
INFO 12:56:43,831 HelpFormatter - Date/Time: 2016/07/31 12:56:43
INFO 12:56:43,832 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:56:43,832 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:56:43,938 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:56:44,466 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
WARN 12:56:44,690 RMDTrackBuilder - Index file /mnt/volume/GATK_resources/hg19/hapmap_3.3.hg19.sites.vcf.idx is out of date (index older than input file), deleting and updating the index file
INFO 12:57:00,046 RMDTrackBuilder - Writing Tribble index to disk for file /mnt/volume/GATK_resources/hg19/hapmap_3.3.hg19.sites.vcf.idx
WARN 12:57:03,737 RMDTrackBuilder - Index file /mnt/volume/GATK_resources/hg19/1000G_omni2.5.hg19.sites.vcf.idx is out of date (index older than input file), deleting and updating the index file
INFO 12:57:13,829 RMDTrackBuilder - Writing Tribble index to disk for file /mnt/volume/GATK_resources/hg19/1000G_omni2.5.hg19.sites.vcf.idx
WARN 12:57:17,152 RMDTrackBuilder - Index file /mnt/volume/GATK_resources/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.idx is out of date (index older than input file), deleting and updating the index file
INFO 13:02:48,774 RMDTrackBuilder - Writing Tribble index to disk for file /mnt/volume/GATK_resources/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.idx
INFO 13:03:03,467 MicroScheduler - Running the GATK in parallel mode with 2 total threads, 1 CPU thread(s) for each of 2 data thread(s), of 23 processors available on this machine
INFO 13:03:03,538 GenomeAnalysisEngine - Preparing for traversal
INFO 13:03:03,544 GenomeAnalysisEngine - Done preparing for traversal
INFO 13:03:03,544 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 13:03:03,544 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 13:03:03,545 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 13:03:03,552 TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
INFO 13:03:03,552 TrainingSet - Found omni track: Known = false Training = true Truth = true Prior = Q12.0
INFO 13:03:03,552 TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
INFO 13:03:03,553 TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
INFO 13:03:33,548 ProgressMeter - chr2:7277813 5404824.0 30.0 s 5.0 s 8.2% 6.1 m 5.6 m
INFO 13:04:03,550 ProgressMeter - chr3:8026048 1.0996173E7 60.0 s 5.0 s 16.0% 6.3 m 5.3 m
INFO 13:04:33,558 ProgressMeter - chr4:45452689 1.652032E7 90.0 s 5.0 s 23.5% 6.4 m 4.9 m
INFO 13:05:03,559 ProgressMeter - chr5:97188979 2.2031845E7 120.0 s 5.0 s 31.2% 6.4 m 4.4 m
INFO 13:05:33,561 ProgressMeter - chr6:152065375 2.7591264E7 2.5 m 5.0 s 38.7% 6.5 m 4.0 m
INFO 13:06:03,583 ProgressMeter - chr8:42272169 3.311082E7 3.0 m 5.0 s 45.7% 6.6 m 3.6 m
INFO 13:06:33,635 ProgressMeter - chr10:10383963 3.8567595E7 3.5 m 5.0 s 53.9% 6.5 m 3.0 m
INFO 13:07:03,637 ProgressMeter - chr11:112141492 4.4136703E7 4.0 m 5.0 s 61.5% 6.5 m 2.5 m
INFO 13:07:33,661 ProgressMeter - chr13:98164188 4.9662537E7 4.5 m 5.0 s 69.6% 6.5 m 118.0 s
INFO 13:08:03,662 ProgressMeter - chr16:55169357 5.5297128E7 5.0 m 5.0 s 78.6% 6.4 m 81.0 s
INFO 13:08:33,664 ProgressMeter - chr19:40724682 6.1055211E7 5.5 m 5.0 s 86.1% 6.4 m 53.0 s
INFO 13:09:03,668 ProgressMeter - chrX:117786195 6.6882406E7 6.0 m 5.0 s 95.6% 6.3 m 16.0 s
INFO 13:09:07,762 VariantDataManager - QD: mean = NaN standard deviation = NaN

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
ERROR ------------------------------------------------------------------------------------------

I looked around, and did run the variant validation, which didn't give me any error:

java -Xincgc -Xmx4G -jar /usr/share/GATK/GenomeAnalysisTK-3.6.0.jar -T ValidateVariants -R /mnt/volume/GATK_resources/hg19/ucsc.hg19.fasta -V /mnt/volume/hhx037/GMI/joint_genotype/greekMI45.vcf --dbsnp /mnt/volume/GATK_resources/hg19/dbsnp_138.hg19.vcf
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
INFO 16:12:38,530 HelpFormatter - --------------------------------------------------------------------------------
INFO 16:12:38,532 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
INFO 16:12:38,533 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 16:12:38,533 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
INFO 16:12:38,533 HelpFormatter - [Sun Jul 31 16:12:38 UTC 2016] Executing on Linux 3.13.0-65-generic amd64
INFO 16:12:38,533 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_101-b13 JdkDeflater
INFO 16:12:38,537 HelpFormatter - Program Args: -T ValidateVariants -R /mnt/volume/GATK_resources/hg19/ucsc.hg19.fasta -V /mnt/volume/hhx037/GMI/joint_genotype/greekMI45.vcf --dbsnp /mnt/volume/GATK_resources/hg19/dbsnp_138.hg19.vcf
INFO 16:12:38,540 HelpFormatter - Executing as stephaneb@stavroula on Linux 3.13.0-65-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_101-b13.
INFO 16:12:38,540 HelpFormatter - Date/Time: 2016/07/31 16:12:38
INFO 16:12:38,541 HelpFormatter - --------------------------------------------------------------------------------
INFO 16:12:38,541 HelpFormatter - --------------------------------------------------------------------------------
INFO 16:12:38,549 GenomeAnalysisEngine - Strictness is SILENT
INFO 16:12:38,717 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 16:12:38,955 GenomeAnalysisEngine - Preparing for traversal
INFO 16:12:38,966 GenomeAnalysisEngine - Done preparing for traversal
INFO 16:12:38,967 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 16:12:38,967 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 16:12:38,968 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 16:13:08,971 ProgressMeter - chr1:243966575 5074649.0 30.0 s 5.0 s 7.8% 6.4 m 5.9 m
INFO 16:13:38,998 ProgressMeter - chr2:216426889 1.0107194E7 60.0 s 5.0 s 14.8% 6.7 m 5.7 m
INFO 16:14:09,000 ProgressMeter - chr3:187259872 1.5089059E7 90.0 s 5.0 s 21.7% 6.9 m 5.4 m
INFO 16:14:39,001 ProgressMeter - chr5:10763056 2.017909E7 120.0 s 5.0 s 28.4% 7.0 m 5.0 m
INFO 16:15:09,068 ProgressMeter - chr6:33408891 2.4939917E7 2.5 m 6.0 s 34.9% 7.2 m 4.7 m
INFO 16:15:39,069 ProgressMeter - chr7:73034323 2.9927414E7 3.0 m 6.0 s 41.7% 7.2 m 4.2 m
INFO 16:16:09,070 ProgressMeter - chr8:108555278 3.4535874E7 3.5 m 6.0 s 47.9% 7.3 m 3.8 m
INFO 16:16:39,071 ProgressMeter - chr10:59307836 3.9699003E7 4.0 m 6.0 s 55.5% 7.2 m 3.2 m
INFO 16:17:09,078 ProgressMeter - chr11:131888725 4.4606376E7 4.5 m 6.0 s 62.1% 7.2 m 2.7 m
INFO 16:17:49,080 ProgressMeter - chr14:42353253 5.0663264E7 5.2 m 6.0 s 71.5% 7.2 m 2.1 m
INFO 16:18:19,080 ProgressMeter - chr16:69630853 5.5640466E7 5.7 m 6.0 s 79.0% 7.2 m 90.0 s
INFO 16:18:49,082 ProgressMeter - chr19:31350577 6.0858352E7 6.2 m 6.0 s 85.8% 7.2 m 61.0 s
INFO 16:19:19,083 ProgressMeter - chrX:55385953 6.5896111E7 6.7 m 6.0 s 93.6% 7.1 m 27.0 s
Successfully validated the input file. Checked 67716180 records with no failures.
INFO 16:19:29,531 ProgressMeter - done 6.771618E7 6.8 m 6.0 s 98.7% 6.9 m 5.0 s

INFO 16:19:29,532 ProgressMeter - Total runtime 410.56 secs, 6.84 min, 0.11 hours

Done. There were no warn messages.

Shouldn't running a joint genotyping with GenotypeGVCFs produce a vcf fit for VSQR?
If not and I now need to add QD annotation, how can one run VariantAnnotator on a vcf file that contains multiple samples?
I'm really not sure what to do now, and would appreciate any help.

Best regards,

Stephane

Best Answer

Answers

Sign In or Register to comment.