If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator Problem: QD Annotation

TarrTarr Member
edited March 12 in Ask the GATK team

I'm just new to this community and I'm trying to perform a joint analysis of somatic SNV + indels according to Best Practices. When I try to perform VariantRecalibration:

java -jar -T GenomeAnalysisTK.jar VariantRecalibrator -R ref.fna \
-input cohort_raw.vcf \
-resource:hapmap, known=false,training=true,truth=true,prior=15.0 /hapmap_3.3.hg19.sites.vcf.gz \
-resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg19.sites.vcf.gz \
-resource:MG,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 GRCh37_latest_dbSNP_all.vcf.gz \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff \
-mode SNP -recalFile output_snp_cohort.recal \
-tranchesFile output_snp_cohort.tranches \
-rscriptFile output_snp_cohort.plots.R

I always get the message:

## MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.

I downloaded the training files from the resource bundle.

My input file look like this:

## NC_000017.10:41196312-41577500 52 . C T 18871.87 . AC=3;AF=0.021;AN=144;BaseQRankSum=0.721;ClippingRankSum=0.00;DP=32141;ExcessHet=3.1024;FS=0.521;InbreedingCoeff=-0.0228;MLEAC=3;MLEAF=0.021;MQ=60.00;MQRankSum=0.00;QD=9.24;ReadPosRankSum=1.43;SOR=0.728 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0

I read in other posts that it could be due to "-nt", I disabled it and it didn't work. And also I read that maybe was by the lack of QD annotation in the joint g.vcf. So I tried to re-annotate before the g.vcf merge:

java -jar GenomeAnalysisTK.jar \
-R reference.fasta \
-T VariantAnnotator \
-I input.bam \
-V input.vcf \
-o output.vcf \
-A Coverage -A MappingQualityRankSumTest -A QualByDepth \
-A RMSMappingQuality -A ReadPosRankSumTest -A StrandOddsRatio \
-L input.vcf \
--dbsnp dbsnp.vcf
It didn't work. And also tried:

${gatk3} -T GenotypeGVCFs -R ${ref} \
--variant ${final}/cohort.g.vcf -maxAltAlleles 8 -nt 8 --dbsnp ${vcfref} \
-A Coverage -A MappingQualityRankSumTest -A QualByDepth \
-A RMSMappingQuality -A ReadPosRankSumTest -A StrandOddsRatio \
-A FisherStrand -o ${final}/cohort_raw.vcf
The same result. I'm out of options. Help!



  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Tarr

    Would you please try to run this with the latest version of GATK4.1 and see if the problem persists?

  • TarrTarr Member
    With 4.1 version it seems that works but I have to re-start the whole analysis because the "Input files reference and features have incompatible contigs". I used as a reference fasta a small version, with my genes, using NC_XXX notation and it's incompatible with your reference files.

Sign In or Register to comment.