Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

QD annotation not found: Variant Recalibrator error

Hi GATK community,
While working on the GATK pipeline for variant analysis, we have been getting the same error (below) repeatedly, in several scenarios we had tried prior to posting here.This is the error we get while running VQSR for SNPs.
ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.

Background/ Details:
Reference Genome used: hg19   Individual sequencing data used: SRR1607270.sra
GATK version: 3.7
All the supporting files were taken from the GATK resource bundle for hg19 genome
So when we ran the command for variant recalibration around SNPs
java -jar GenomeAnalysisTK.jar -T VariantRecalibrator  -R  ucsc.hg19.fasta -input hg19variantcall.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg19.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_138.hg19.vcf  -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -mode SNP -hg19recalFile output.recal -hg19tranchesFile output.tranches -hg19rscriptFile output.plots.R
the following error occurred:
ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
We tried using the VariantAnnotator Tool by giving the below command
java -jar GenomeAnalysisTK.jar  -R ucsc.hg19.fasta -T VariantAnnotator  -I PrintReads.bam  -V variantcall.vcf  -o annoutput.vcf   -A QualbyDepth  -L variantcall.vcf  --dbsnp dbsnp_138.hg19.vcf
for QD annotation has suggested but still the same ERROR persists for VQSR. Could you please guide us on what we could be possibly doing wrong?
To call variant, the command used was:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -R ucsc.hg19.fasta -I PrintReads.bam  -o hg19variantcall.vcf

Thanks in advance

Febin

GC Tech

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    My first observation: you shouldn't be running VQSR on the GVCF output of HaplotypeCaller run with ERC GVCF. You need to run GenotypeGVCFs first.

    If it still doesn't work after that, check that you actually have QD annotated, and that you have enough variants for VQSR. Is this exome, whole genome or gene panel?

  • Febin_GCFebin_GC IndiaMember

    Hi

    Thank you Geraldine for the quick response. Some facts regarding the .vcf file

    1. Number of Samples : 1 ( Genomic-Seq data : individual sequencing data (HG01286) from human with 1000 genomes as an example, which was obtained by single-end read sequencing using Illumina’s NGS instrument - SRR1607270.sra )
    2. Number of SNPs : 461892
    3. Number of indels : 53347
    4. QD annotations are not seen in the .vcf file (even after annotating the .vcf file )

    Ran GenotypeGVCF with the following command

    java -jar GenomeAnalysisTk.jar -T GenotypeGVCFs -R ucsc.hg19.fasta --variant hg19variantcall.g.vcf -o output.vcf

    It went infinitely giving the following warning message:

    WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans...

    The command was forced stopped. What could possibly be wrong?

    Thanks in advance

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Febin_GC
    Hi,

    Can you tell us the version of GATK you are using and the exact command you ran to produce the GVCF?

    Thanks,
    Sheila

  • Febin_GCFebin_GC IndiaMember
    Hi Sheila

    The GATK version is 3.7

    The Genotype GVCF command given

    java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ucsc.hg19.fasta --variant hg19variantcall.AS.g.vcf -o hg19variantcall.vcf

    'hg19variantcall' is the file name given.

    Thanks!
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Febin_GC
    Hi,

    I meant the command for HaplotypeCaller that you used to produce the GVCF file that is input to GenotypeGVCFs.

    Thanks,
    Sheila

  • Febin_GCFebin_GC IndiaMember
    Hi

    Sorry for the confusion Sheila.The command which we gave in the beginning (mentioned earlier)for calling variant is as follows:

    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -R ucsc.hg19.fasta -I PrintReads.bam -o hg19variantcall.vcf


    After trials and errors based on your documentation and discussions the final command given for HaplotypeCaller was:

    java -jar GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T HaplotypeCaller -I hg19PrintReads.bam --emitRefConfidence GVCF --dbsnp dbsnp_138.hg19.vcf -G Standard -G AS_Standard -o hg19variantcall.AS.g.vcf

    where 'hg19PrintReads.bam' is the output of base quality recalibration.

    And later Genotype GVCF was run,which went on with warnings.Hope u understand.

    Thanks in advance

    Febin_GC
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Febin_GC
    Hi Febin_GC,

    Hmm. Can you try running ValidateVariants with --validateGVCF on your input GVCF?

    Can you also post some example records from your GVCF?

    Thanks,
    Sheila

  • Febin_GCFebin_GC IndiaMember

    Hi Sheila

    Ran the command to validate .g.vcf file

    java -jar GenomeAnalysisTK.jar -T ValidateVariants -R ucsc.hg19.fasta -V hg19variantcall.AS.g.vcf --validateGVCF --dbsnp dbsnp_138.hg19.vcf

    The input file was successfully validated .Checked 93 records with no failures.One warning message was seen

    'ValidateVariants: GVCF format is currently incompatible with allele validation.Not validating alleles.'

    Example records of .g.vcf file.More details in the attached file

    GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)

    GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)

    GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)

    GVCFBlock17-18=minGQ=17(inclusive),maxGQ=18(exclusive)

    GVCFBlock18-19=minGQ=18(inclusive),maxGQ=19(exclusive)

    Inline image 1 "hg19output.AS.g.vcf" 47202284L, 4100634030C

    reference=file:///home/fawazfebin/ucsc.hg19.fasta

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 20

    chrM 1 . G . . END=9 GT:DP:GQ:MIN_DP:PL 0/0:8:24:8:0,24,283

    chrM 10 . T . . END=21 GT:DP:GQ:MIN_DP:PL 0/0:9:27:9:0,27,310

    chrM 22 . T . . END=22 GT:DP:GQ:MIN_DP:PL 0/0:9:24:9:0,24,360

    chrM 23 . T . . END=24 GT:DP:GQ:MIN_DP:PL 0/0:9:21:9:0,21,315

    chrM 143 rs375589100 G A, 511.77 . AS_RAW_BaseQRankSum=35,1,37,1|22,1,27,1,30,8,31,3|;AS_RAW_MQ=7200.00|44332.00|0.00;AS_RAW_MQRankSum=60,2|50,1,54,2,60,10|;AS_RAW_ReadPosRankSum=20,2|11,1,13,1,15,1,20,10|;AS_SB_TABLE=1,1|8,5|0,0;BaseQRankSum=-2.312;DB;DP=15;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-0.608;RAW_MQ=51532.00;ReadPosRankSum=-0.607 GT:AD:GQ:PGT:PID:PL:SB 0/1:2,13,0:45:0|1:143_G_A:540,0,45,546,84,630:1,1,8,5

    PFA the file in detail.

    Thanks !
    Febin_GC

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Febin_GC
    Hi Febin_GC,

    What happens if you try running VQSR without -an QD?

    Thanks,
    Sheila

  • Febin_GCFebin_GC IndiaMember

    Hi Sheila

    Prior to VQSR ,shouldn't we run GenotypeGVCFs ? The GenotypeGVCFs command go on infinitely with warnings.

    java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ucsc.hg19.fasta --variant hg19output.AS.g.vcf -o hg19output.AS.vcf

    INFO 17:35:02,930 HelpFormatter - ----------------------------------------------------------------------------------
    INFO 17:35:02,935 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
    INFO 17:35:02,935 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO 17:35:02,935 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
    INFO 17:35:02,936 HelpFormatter - [Tue Feb 07 17:35:02 UTC 2017] Executing on Linux 3.10.0-327.36.3.el7.x86_64 amd64
    INFO 17:35:02,936 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14
    INFO 17:35:02,941 HelpFormatter - Program Args: -T GenotypeGVCFs -R ucsc.hg19.fasta --variant hg19output.AS.g.vcf -o hg19output.AS.vcfja
    INFO 17:35:02,947 HelpFormatter - Executing as [email protected] on Linux 3.10.0-327.36.3.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14.
    INFO 17:35:02,947 HelpFormatter - Date/Time: 2017/02/07 17:35:02
    INFO 17:35:02,948 HelpFormatter - ----------------------------------------------------------------------------------
    INFO 17:35:02,948 HelpFormatter - ----------------------------------------------------------------------------------
    INFO 17:35:02,982 GenomeAnalysisEngine - Strictness is SILENT
    INFO 17:35:03,163 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 17:35:04,302 GenomeAnalysisEngine - Preparing for traversal
    INFO 17:35:04,314 GenomeAnalysisEngine - Done preparing for traversal
    INFO 17:35:04,316 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 17:35:04,316 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 17:35:04,317 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    WARN 17:35:04,553 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
    WARN 17:35:04,554 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
    WARN 17:35:04,555 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
    INFO 17:35:04,555 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
    WARN 17:35:04,664 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 7200.00|44332.00|0.00 doesn't parse and will not be annotated in the final VC.
    WARN 17:35:04,665 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 20,2|11,1,13,1,15,1,20,10| doesn't parse and will not be annotated in the final VC.
    WARN 17:35:04,666 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 60,2|50,1,54,2,60,10| doesn't parse and will not be annotated in the final VC.
    WARN 17:35:04,666 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 1,1|8,5|0,0 doesn't parse and will not be annotated in the final VC.
    WARN 17:35:04,667 ReferenceConfidenceVariantContextMerger - WARNING: remaining (non-reducible) annotations are assumed to be ints or doubles or booleans, but 35,1,37,1|22,1,27,1,30,8,31,3| doesn't parse and will not be annotated in the final VC.....and so on

    What is to be done?

    Thanks
    Febin_GC

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Febin_GC
    Hi Febin_GC,

    The warning statements tell you some annotations cannot be calculated, but does the tool run to completion and give you a final VCF?

    Thanks,
    Sheila

  • Febin_GCFebin_GC IndiaMember
    Hi Sheila

    What is the expected runtime for GenoypeGVCFs handling a single sample? Didnt keep it for completion as it was taking time and we could see only warnings.

    Thanks
    Febin
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Febin_GC
    Hi Febin,

    Have a look at this paper. You should let the tool run to completion then see if VQSR runs properly.

    -Sheila

  • Febin_GCFebin_GC IndiaMember
    Hi Sheila

    Ran GenotypeGVCFs to completion and got output of variant recalibrator.Thank you Sheila and Geraldine for your quick responses and support .Great going!

    Thanks

    Febin_GC
Sign In or Register to comment.