VariantRecalibrator Problem: QD Annotation

Tarr
March 2019

I'm just new to this community and I'm trying to perform a joint analysis of somatic SNV + indels according to Best Practices. When I try to perform VariantRecalibration:

java -jar -T GenomeAnalysisTK.jar VariantRecalibrator -R ref.fna \
-input cohort_raw.vcf \
-resource:hapmap, known=false,training=true,truth=true,prior=15.0 /hapmap_3.3.hg19.sites.vcf.gz \
-resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg19.sites.vcf.gz \
-resource:MG,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 GRCh37_latest_dbSNP_all.vcf.gz \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff \
-mode SNP -recalFile output_snp_cohort.recal \
-tranchesFile output_snp_cohort.tranches \
-rscriptFile output_snp_cohort.plots.R

I always get the message:

## MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.

I downloaded the training files from the resource bundle.

My input file look like this:

## NC_000017.10:41196312-41577500 52 . C T 18871.87 . AC=3;AF=0.021;AN=144;BaseQRankSum=0.721;ClippingRankSum=0.00;DP=32141;ExcessHet=3.1024;FS=0.521;InbreedingCoeff=-0.0228;MLEAC=3;MLEAF=0.021;MQ=60.00;MQRankSum=0.00;QD=9.24;ReadPosRankSum=1.43;SOR=0.728 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0

I read in other posts that it could be due to "-nt", I disabled it and it didn't work. And also I read that maybe was by the lack of QD annotation in the joint g.vcf. So I tried to re-annotate before the g.vcf merge:

java -jar GenomeAnalysisTK.jar \
-R reference.fasta \
-T VariantAnnotator \
-I input.bam \
-V input.vcf \
-o output.vcf \
-A Coverage -A MappingQualityRankSumTest -A QualByDepth \
-A RMSMappingQuality -A ReadPosRankSumTest -A StrandOddsRatio \
-L input.vcf \
--dbsnp dbsnp.vcf
It didn't work. And also tried:

${gatk3} -T GenotypeGVCFs -R ${ref} \
--variant ${final}/cohort.g.vcf -maxAltAlleles 8 -nt 8 --dbsnp ${vcfref} \
-A Coverage -A MappingQualityRankSumTest -A QualByDepth \
-A RMSMappingQuality -A ReadPosRankSumTest -A StrandOddsRatio \
-A FisherStrand -o ${final}/cohort_raw.vcf
The same result. I'm out of options. Help!



  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Tarr

    Would you please try to run this with the latest version of GATK4.1 and see if the problem persists?

  • TarrTarr Member
    With 4.1 version it seems that works but I have to re-start the whole analysis because the "Input files reference and features have incompatible contigs". I used as a reference fasta a small version, with my genes, using NC_XXX notation and it's incompatible with your reference files.

