To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Missing QD tags at some positions

Hey there,

How can it be possible that some of my snps or indels calls miss the QD tag? I'm doing the recommended workflow and I've tested if for both RNAseq (hard filters complains, that's how I saw those tags were missing) and Exome sequencing (VQSR). How can a hard filter applied on QD on a line without actually that tag can be considered to pass that threshold too? I'm seeing a lot more INDELs in RNAseq where this kind of scenario is happening as well.

Here's the command lines that I used :

Forming gvcf files like this (for RNAseq in that case):

java -Djava.io.tmpdir=$LSCRATCH -Xmx46g -jar /home/apps/Logiciels/GATK/3.2-2/GenomeAnalysisTK.jar -l INFO -pairHMM VECTOR_LOGLESS_CACHING -T HaplotypeCaller -I sample.bam -R ~/References/hg19/hg19.fasta --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o sample.g.vcf --dbsnp ~/References/hg19/dbSNP/dbsnp_138.b37.excluding_sites_after_129.CEU_converted.vcf -recoverDanglingHeads -dontUseSoftClippedBases -stand_call_conf 20.0 -stand_emit_conf 0.0 -mmq 1 -U ALLOW_N_CIGAR_READS

GenotypeGVCF after :

java -Djava.io.tmpdir=$LSCRATCH -Xmx22g -jar /home/apps/Logiciels/GATK/3.2-2/GenomeAnalysisTK.jar -l INFO -T GenotypeGVCFs --dbsnp ~/References/hg19/dbSNP/dbsnp_138.b37.excluding_sites_after_129.CEU_converted.vcf -A QualByDepth -A FisherStrand -o Joint_file/96RNAseq.10092014.STAR.q1.1kb.vcf -R ~/References/hg19/hg19.fasta

VQSR (separated for indels and snvs) or Hard filter at the end :

VQSR

java -Djava.io.tmpdir=$LSCRATCH -Xmx10g -jar /home/apps/Logiciels/GATK/3.2-2/GenomeAnalysisTK.jar -l INFO -T VariantRecalibrator -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -mode SNP -resource:1000G,known=false,training=true,truth=false,prior=10.0 ~/References/hg19/VQSR/1000G_phase1.snps.high_confidence.b37.noGL.converted.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ~/References/hg19/VQSR/hapmap_3.3.b37.noGL.nochrM.converted.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 ~/References/hg19/VQSR/1000G_omni2.5.b37.noGL.nochrM.converted.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 ~/References/hg19/VQSR/dbsnp_138.b37.excluding_sites_after_129.noGL.nochrM.converted.vcf -input snv.vcf -recalFile 96Exomes.HC.TruSeq.snv.RECALIBRATED -tranchesFile 96Exomes.HC.TruSeq.snv.tranches -rscriptFile 96Exomes.HC.TruSeq.snv.plots.R -R ~/References/hg19/hg19.fasta --maxGaussians 4

java -Djava.io.tmpdir=$LSCRATCH -Xmx10g -jar /home/apps/Logiciels/GATK/3.2-2/GenomeAnalysisTK.jar -l INFO -T ApplyRecalibration -ts_filter_level 99.0 -mode SNP -input snv.vcf -recalFile 96Exomes.HC.TruSeq.snv.RECALIBRATED -tranchesFile 96Exomes.HC.TruSeq.snv.tranches -o 96Exomes.HC.TruSeq.snv.recal.vcf -R ~/References/hg19/hg19.fasta

HARD FILTER (RNASeq)

java -Djava.io.tmpdir=$LSCRATCH -Xmx2g -jar /home/apps/Logiciels/GATK/3.1-1/GenomeAnalysisTK.jar -l INFO -T VariantFiltration -R ~/References/hg19/hg19.fasta -V 96RNAseq.STAR.q1.vcf -window 35 -cluster 3 -filterName FS -filter "FS > 30.0" -filterName QD -filter "QD < 2.0" -o 96RNAseq.STAR.q1.FS30.QD2.vcf

Here are some examples for RNAseq :

chr1 6711349 . G A 79.10 PASS BaseQRankSum=-1.369e+00;ClippingRankSum=1.00;DP=635;FS=1.871;MLEAC=1;MLEAF=5.495e-03;MQ=60.00;MQ0=0;MQRankSum=-4.560e-01;ReadPosRankSum=-1.187e+00 GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,280 ./.:0,0:0 0/0:9,0:9:21:0,21,248 0/0:7,0:7:21:0,21,196 0/0:7,0:7:21:0,21,226 0/0:8,0:8:21:0,21,227 0/0:8,0:8:21:0,21,253 0/0:7,0:7:21:0,21,218 0/0:9,0:9:27:0,27,282 1/1:0,0:5:15:137,15,0 0/0:2,0:2:6:0,6,47 0/0:28,0:28:78:0,78,860 0/0:7,0:7:21:0,21,252 0/0:2,0:2:6:0,6,49 0/0:5,0:5:12:0,12,152 0/0:3,0:3:6:0,6,90 0/0:4,0:4:12:0,12,126 0/0:9,0:9:21:0,21,315 0/0:7,0:7:21:0,21,256 0/0:7,0:7:21:0,21,160 0/0:8,0:8:21:0,21,298 0/0:20,0:20:60:0,60,605 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,67 0/0:2,0:2:6:0,6,71 0/0:14,0:14:20:0,20,390 0/0:7,0:7:21:0,21,223 0/0:7,0:7:21:0,21,221 0/0:4,0:4:12:0,12,134 0/0:2,0:2:6:0,6,54 ./.:0,0:0 0/0:4,0:4:9:0,9,118 0/0:8,0:8:21:0,21,243 0/0:6,0:6:15:0,15,143 0/0:8,0:8:21:0,21,244 0/0:7,0:7:21:0,21,192 0/0:2,0:2:6:0,6,54 0/0:13,0:13:27:0,27,359 0/0:8,0:8:21:0,21,245 0/0:7,0:7:21:0,21,218 0/0:12,0:12:36:0,36,354 0/0:8,0:8:21:0,21,315 0/0:7,0:7:21:0,21,215 0/0:2,0:2:6:0,6,49 0/0:10,0:10:24:0,24,301 0/0:7,0:7:21:0,21,208 0/0:7,0:7:21:0,21,199 0/0:2,0:2:6:0,6,47 0/0:3,0:3:9:0,9,87 0/0:2,0:2:6:0,6,73 0/0:7,0:7:21:0,21,210 0/0:8,0:8:22:0,22,268 0/0:7,0:7:21:0,21,184 0/0:7,0:7:21:0,21,213 0/0:5,0:5:9:0,9,135 0/0:7,0:7:21:0,21,200
0/0:4,0:4:12:0,12,118 0/0:7,0:7:21:0,21,232 0/0:7,0:7:21:0,21,232 0/0:7,0:7:21:0,21,217 0/0:8,0:8:21:0,21,255 0/0:9,0:9:24:0,24,314 0/0:8,0:8:21:0,21,221 0/0:9,0:9:24:0,24,276 0/0:9,0:9:21:0,21,285 0/0:3,0:3:6:0,6,90 0/0:2,0:2:6:0,6,57 0/0:13,0:13:20:0,20,385 0/0:2,0:2:6:0,6,48 0/0:11,0:11:27:0,27,317 0/0:8,0:8:21:0,21,315 0/0:9,0:9:24:0,24,284 0/0:7,0:7:21:0,21,228 0/0:14,0:14:33:0,33,446 0/0:2,0:2:6:0,6,64 0/0:2,0:2:6:0,6,72 0/0:7,0:7:21:0,21,258 0/0:10,0:10:27:0,27,348 0/0:7,0:7:21:0,21,219 0/0:9,0:9:21:0,21,289 0/0:20,0:20:57:0,57,855 0/0:4,0:4:12:0,12,146 0/0:7,0:7:21:0,21,205 0/0:12,0:14:36:0,36,1030 0/0:3,0:3:6:0,6,87
0/0:2,0:2:6:0,6,60 0/0:7,0:7:21:0,21,226 0/0:7,0:7:21:0,21,229 0/0:8,0:8:21:0,21,265 0/0:4,0:4:6:0,6,90 ./.:0,0:0
0/0:7,0:7:21:0,21,229 0/0:2,0:2:6:0,6,59 0/0:2,0:2:6:0,6,56
chr1 7992047 . T C 45.83 SnpCluster BaseQRankSum=1.03;ClippingRankSum=0.00;DP=98;FS=0.000;MLEAC=1;MLEAF=0.014;MQ=60.00;MQ0=0;MQRankSum=-1.026e+00;ReadPosRankSum=-1.026e+00 GT:AD:DP:GQ:PL ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,70 0/0:2,0:2:6:0,6,45 0/0:3,0:3:6:0,6,87 0/0:2,0:2:6:0,6,52 ./.:0,0:0 ./.:0,0:0
./.:1,0:1 ./.:0,0:0 0/0:2,0:2:6:0,6,55 0/0:2,0:2:6:0,6,49 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,61
0/0:2,0:2:6:0,6,49 ./.:0,0:0 ./.:0,0:0 0/0:3,0:3:6:0,6,90 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,52
./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,69 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,64 ./.:0,0:0 0/0:2,0:2:6:0,6,37 ./.:0,0:0
0/0:2,0:2:6:0,6,67 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,68 ./.:0,0:0 ./.:0,0:0
./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:11,0:11:24:0,24,360 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,68 0/0:2,0:2:6:0,6,50 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0
./.:0,0:0 0/0:2,0:2:6:0,6,50 0/0:3,0:3:6:0,6,90 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:4:6:0,6,50
./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:7,0:7:21:0,21,231 0/0:2,0:2:6:0,6,64 ./.:0,0:0 0/0:2,0:2:6:0,6,63
0/0:2,0:2:6:0,6,70 ./.:0,0:0 0/0:6,0:6:15:0,15,148 ./.:0,0:0 ./.:0,0:0 1/1:0,0:2:6:90,6,0 ./.:0,0:0
0/0:2,0:2:6:0,6,63 0/0:2,0:2:6:0,6,74 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,58
0/0:2,0:2:6:0,6,71 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49

For Exome Seq now :
chr2 111878571 . C T 93.21 PASS DP=634;FS=0.000;MLEAC=1;MLEAF=5.319e-03;MQ=60.00;MQ0=0;VQSLOD=14.19;culprit=MQ GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,243 0/0:4,0:4:9:0,9,135 0/0:7,0:7:18:0,18,270 0/0:7,0:7:21:0,21,230 0/0:16,0:16:48:0,48,542 0/0:8,0:8:21:0,21,315 0/0:6,0:6:18:0,18,186 0/0:5,0:5:15:0,15,168 0/0:6,0:6:15:0,15,225 0/0:10,0:10:30:0,30,333 0/0:7,0:7:21:0,21,239
0/0:6,0:6:18:0,18,202 0/0:6,0:6:15:0,15,225 0/0:7,0:7:21:0,21,225 0/0:8,0:8:24:0,24,272 0/0:5,0:5:15:0,15,168 1/1:0,0:13:13:147,13,0 0/0:2,0:2:6:0,6,73 0/0:8,0:8:24:0,24,256 0/0:14,0:14:4:0,4,437 0/0:3,0:3:9:0,9,85 0/0:4,0:4:12:0,12,159 0/0:7,0:7:21:0,21,238 0/0:5,0:5:15:0,15,195 0/0:7,0:7:15:0,15,225 0/0:12,0:12:36:0,36,414 0/0:4,0:4:12:0,12,156 0/0:7,0:7:0:0,0,190 0/0:2,0:2:6:0,6,64 0/0:7,0:7:21:0,21,242 0/0:7,0:7:21:0,21,234 0/0:8,0:8:24:0,24,267 0/0:7,0:7:21:0,21,245 0/0:7,0:7:21:0,21,261 0/0:6,0:6:18:0,18,204 0/0:8,0:8:24:0,24,302 0/0:5,0:5:15:0,15,172 0/0:9,0:9:24:0,24,360 0/0:18,0:18:51:0,51,649 0/0:5,0:5:15:0,15,176 0/0:2,0:2:6:0,6,70 0/0:14,0:14:33:0,33,495 0/0:4,0:4:9:0,9,135 0/0:8,0:8:21:0,21,315 0/0:4,0:4:12:0,12,149 0/0:4,0:4:6:0,6,90 0/0:10,0:10:27:0,27,405 0/0:3,0:3:6:0,6,90 0/0:4,0:4:12:0,12,133 0/0:14,0:14:6:0,6,431 0/0:4,0:4:12:0,12,151 0/0:5,0:5:15:0,15,163 0/0:3,0:3:9:0,9,106 0/0:7,0:7:21:0,21,237 0/0:7,0:7:21:0,21,268 0/0:8,0:8:21:0,21,315 0/0:2,0:2:6:0,6,68 ./.:0,0:0 0/0:3,0:3:9:0,9,103 0/0:7,0:7:21:0,21,230 0/0:3,0:3:6:0,6,90 0/0:9,0:9:26:0,26,277 0/0:7,0:7:21:0,21,236 0/0:5,0:5:15:0,15,170 ./.:1,0:1 0/0:15,0:15:45:0,45,653 0/0:8,0:8:24:0,24,304 0/0:6,0:6:15:0,15,225 0/0:3,0:3:9:0,9,103 0/0:2,0:2:6:0,6,79 0/0:7,0:7:21:0,21,241 0/0:4,0:4:12:0,12,134 0/0:3,0:3:6:0,6,90 0/0:5,0:5:15:0,15,159 0/0:4,0:4:12:0,12,136 0/0:5,0:5:12:0,12,180 0/0:11,0:11:21:0,21,315 0/0:13,0:13:39:0,39,501 0/0:3,0:3:9:0,9,103 0/0:8,0:8:24:0,24,257 0/0:2,0:2:6:0,6,73 0/0:8,0:8:24:0,24,280 0/0:4,0:4:12:0,12,144 0/0:4,0:4:9:0,9,135 0/0:8,0:8:24:0,24,298 0/0:4,0:4:12:0,12,129 0/0:5,0:5:15:0,15,184 0/0:2,0:2:6:0,6,62
0/0:2,0:2:6:0,6,65 0/0:9,0:9:27:0,27,337 0/0:7,0:7:21:0,21,230 0/0:7,0:7:21:0,21,239 0/0:5,0:5:0:0,0,113 0/0:11,0:11:33:0,33,369 0/0:7,0:7:21:0,21,248 0/0:10,0:10:30:0,30,395

Thanks for your help.

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Does QD show up in your GVCFs?

  • JCGrenierJCGrenier Montreal, QCMember

    I don't have any QD for any line in the gvcf. However, it seems to be able to report it for the majority of the positions when doing the GenotypeGVCFs command.

    Here's what I found for the exome sample where I had a non-ref call. So it seems like a deletion but it reports a SNV.

    chr2 111878556 . TGCCGCCGCCGCCGCCGCCGCC T,TGCCGCCGCC,TGCCGCCGCCGCC, 345.19 . DP=19;MLEAC=0,1,1,0;MLEAF=0.00,0.500,0.500,0.00;MQ=53.34;MQ0=0 GT:AD:DP:GQ:PL:SB 2/3:0,0,8,5,0:13:99:617,536,740,205,217,178,388,301,0,368,483,496,194,279,470:0,0,5,0

    Are we supposed to have QD values in the GVCF files?

    Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Are we supposed to have QD values in the GVCF files?

    Actually no, sorry -- it's one of those fields that's left out to save space, and added by GenotypeGVCFs. Still getting used to the new workflow details on my end too.

    The call you're showing is a mixed record, which shows evidence for both a SNP and an indel. FYI that is treated by VQSR as an indel. I'm wondering if it's having trouble calculating QD for mixed records. Are all the other cases where you don't have QD mixed records like this?

  • JCGrenierJCGrenier Montreal, QCMember

    Not necessarily. Here's the example of the non-ref for the first example in the RNAseq analysis :

    chr1 6711349 . G A, 104.03 . DP=5;MLEAC=2,0;MLEAF=1.00,0.00;MQ=60.00;MQ0=0 GT:AD:DP:GQ:PL:SB 1/1:0,5,0:5:15:137,15,0,137,15,137:0,0,2,3

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    OK -- and what does that site look like after GenotypeGVCFs?

  • JCGrenierJCGrenier Montreal, QCMember

    Like this. Everything seems normal but there's no QD tag.

    chr1 6711349 . G A 79.10 PASS BaseQRankSum=-1.369e+00;ClippingRankSum=1.00;DP=635;FS=1.871;MLEAC=1;MLEAF=5.495e-03;MQ=60.00;MQ0=0;MQRankSum=-4.560e-01;ReadPosRankSum=-1.187e+00 GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,280 ./.:0,0:0 0/0:9,0:9:21:0,21,248 0/0:7,0:7:21:0,21,196 0/0:7,0:7:21:0,21,226 0/0:8,0:8:21:0,21,227 0/0:8,0:8:21:0,21,253 0/0:7,0:7:21:0,21,218 0/0:9,0:9:27:0,27,282 1/1:0,0:5:15:137,15,0 0/0:2,0:2:6:0,6,47 0/0:28,0:28:78:0,78,860 0/0:7,0:7:21:0,21,252 0/0:2,0:2:6:0,6,49 0/0:5,0:5:12:0,12,152 0/0:3,0:3:6:0,6,90 0/0:4,0:4:12:0,12,126 0/0:9,0:9:21:0,21,315 0/0:7,0:7:21:0,21,256 0/0:7,0:7:21:0,21,160 0/0:8,0:8:21:0,21,298 0/0:20,0:20:60:0,60,605 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,67 0/0:2,0:2:6:0,6,71 0/0:14,0:14:20:0,20,390 0/0:7,0:7:21:0,21,223 0/0:7,0:7:21:0,21,221 0/0:4,0:4:12:0,12,134 0/0:2,0:2:6:0,6,54 ./.:0,0:0 0/0:4,0:4:9:0,9,118 0/0:8,0:8:21:0,21,243 0/0:6,0:6:15:0,15,143 0/0:8,0:8:21:0,21,244 0/0:7,0:7:21:0,21,192 0/0:2,0:2:6:0,6,54 0/0:13,0:13:27:0,27,359 0/0:8,0:8:21:0,21,245 0/0:7,0:7:21:0,21,218 0/0:12,0:12:36:0,36,354 0/0:8,0:8:21:0,21,315 0/0:7,0:7:21:0,21,215 0/0:2,0:2:6:0,6,49 0/0:10,0:10:24:0,24,301 0/0:7,0:7:21:0,21,208 0/0:7,0:7:21:0,21,199 0/0:2,0:2:6:0,6,47 0/0:3,0:3:9:0,9,87 0/0:2,0:2:6:0,6,73 0/0:7,0:7:21:0,21,210 0/0:8,0:8:22:0,22,268 0/0:7,0:7:21:0,21,184 0/0:7,0:7:21:0,21,213 0/0:5,0:5:9:0,9,135 0/0:7,0:7:21:0,21,200 0/0:4,0:4:12:0,12,118 0/0:7,0:7:21:0,21,232 0/0:7,0:7:21:0,21,232 0/0:7,0:7:21:0,21,217 0/0:8,0:8:21:0,21,255 0/0:9,0:9:24:0,24,314 0/0:8,0:8:21:0,21,221 0/0:9,0:9:24:0,24,276 0/0:9,0:9:21:0,21,285 0/0:3,0:3:6:0,6,90 0/0:2,0:2:6:0,6,57 0/0:13,0:13:20:0,20,385 0/0:2,0:2:6:0,6,48 0/0:11,0:11:27:0,27,317 0/0:8,0:8:21:0,21,315 0/0:9,0:9:24:0,24,284 0/0:7,0:7:21:0,21,228 0/0:14,0:14:33:0,33,446 0/0:2,0:2:6:0,6,64 0/0:2,0:2:6:0,6,72 0/0:7,0:7:21:0,21,258 0/0:10,0:10:27:0,27,348 0/0:7,0:7:21:0,21,219 0/0:9,0:9:21:0,21,289 0/0:20,0:20:57:0,57,855 0/0:4,0:4:12:0,12,146 0/0:7,0:7:21:0,21,205 0/0:12,0:14:36:0,36,1030 0/0:3,0:3:6:0,6,87 0/0:2,0:2:6:0,6,60 0/0:7,0:7:21:0,21,226 0/0:7,0:7:21:0,21,229 0/0:8,0:8:21:0,21,265 0/0:4,0:4:6:0,6,90 ./.:0,0:0 0/0:7,0:7:21:0,21,229 0/0:2,0:2:6:0,6,59 0/0:2,0:2:6:0,6,56

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm. I'll ask the devs if there's anything that could explain that, because it doesn't seem like the expected behavior to me.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @JCGrenier‌

    We think this might be due to a bug that sometimes affects the AD field, which is used to calculate QD. You can see that the hom-var sample has 0,0 AD in the final record, which is wrong since it was 0,5 in the GVCF record you posted earlier.

    The bug is fixed in the nightly build (see Download page), so please try rerunning GenotypeGVCFs with that version and let me know what results you get. Sorry for the inconvenience!

  • JCGrenierJCGrenier Montreal, QCMember

    Thanks for your help @Geraldine_VdAuwera, I will test it shortly! I'll keep you posted.

  • @JCGrenier‌ and @Geraldine_VdAuwera‌ I can't help but notice that the only variable positions in your posted VCF lines have 0,0 in their AD fields, meaning you have no ALT depth on which to calculate the QD INFO field (see, https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_QualByDepth.php)

    chr1 6711349 . G A 79.10 PASS BaseQRankSum=-1.369e+00;ClippingRankSum=1.00;DP=635;FS=1.871;MLEAC=1;MLEAF=5.495e-03;MQ=60.00;MQ0=0;MQRankSum=-4.560e-01;ReadPosRankSum=-1.187e+00 GT:AD:DP:GQ:PL ... 1/1:0,0:5:15:137,15,0

    chr1 7992047 . T C 45.83 SnpCluster BaseQRankSum=1.03;ClippingRankSum=0.00;DP=98;FS=0.000;MLEAC=1;MLEAF=0.014;MQ=60.00;MQ0=0;MQRankSum=-1.026e+00;ReadPosRankSum=-1.026e+00 GT:AD:DP:GQ:PL ... 1/1:0,0:2:6:90,6,0

    chr2 111878571 . C T 93.21 PASS DP=634;FS=0.000;MLEAC=1;MLEAF=5.319e-03;MQ=60.00;MQ0=0;VQSLOD=14.19;culprit=MQ GT:AD:DP:GQ:PL ... 1/1:0,0:13:13:147,13,0

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @bredeson‌ You're right, but it's odd that those calls would have ADs of 0,0, hence the assumption that the ADs are wrong. So the missing QDs are just a side effect of the real problem which lies with the ADs.

  • jacobhsujacobhsu Hong KongMember
    edited January 2015

    @‌Geraldine_VdAuwera and @JCGrenier

    According to above discussion when JCGrenier‌ on last September, this argument on VQSR step looks strange to me.
    -resource:omni,known=false,training=true,truth=false,prior=12.0 ~/References/hg19/VQSR/1000G_omni2.5.b37.noGL.nochrM.converted.vcf

    The webpage on this https://www.broadinstitute.org/gatk/guide/article?id=1259 indicates below,
    -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.sites.vcf

    Maybe this is not the issue to affect the QD, but I found it's an inconsistent between the suggested workflow and what JCGrenier did.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Well spotted, @jacobhsu‌ -- the omni sites should be treated as true.

  • JCGrenierJCGrenier Montreal, QCMember

    Thanks @jacobhsu and @bredeson for your comments. I will verify all of that. I know that some things have changed between the 3.2-2 that I was using before and the version 3.3-0. From a logic point of view, those missing QD tags are probably even computed without the need of those references right? We are running the newest version of GATK right now on our dataset so I can tell you if :
    1) The version is changing something
    2) Changing the "truth=true" is affecting something for those calls.

    Thanks a lot guys.

  • jalvesjalves Cambridge, UKMember

    Hi @Geraldine_VdAuwera and @JCGrenier ,

    Is there any update about this? I am having exactly the same issue. I am trying to use the tool "VariantsToTable" to export the several fields including QD field and it gives me the error below. The VCF was generated using GenotypeGVCFs and filtered using --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0". I am using version 3.3.0.

    Many thanks,
    Joel

    ERROR MESSAGE: Missing field QD in vc variant at [VC variant @ 1:3737687 Q1463.36 of type=SNP alleles=[C*, G] attr={AC=4, AF=0.100, AN=40, BaseQRankSum=-3.470e-01, ClippingRankSum=0.116, DP=465, FS=7.698, GQ_MEAN=23.85, GQ_STDDEV=48.94, InbreedingCoeff=0.5368, MLEAC=8, MLEAF=0.200, MQ=60.00, MQ0=0, MQRankSum=-1.160e-01, NCC=6, ReadPosRankSum=-1.736e+00, SOR=0.322} GT=GT:AD:DP:GQ:PGT:PID:PL ./.:5,0:5 ./.:0,0:0 0/0:0,0:33:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:11:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:6:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:14,1:15:39:0|1:3737678_ACTGCCC_A:0,39,583 ./.:1,0:1 1/1:0,0:68:99:.:.:1157,196,0 0/0:0,0:18:1:.:.:0,1,1 ./.:1,0:1 ./.:11,0:11 0/0:18,0:18:9:.:.:0,9,135 0/0:34,0:34:99:0|1:3737678_ACTGCCC_A:0,102,1452 1/1:0,0:27:24:.:.:356,24,0 0/0:3,0:3:0:.:.:0,0,28 0/0:1,0:19:3:0|1:3737681_GCCCCCC_G:0,3,28 0/0:0,0:21:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:18:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:12:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:16:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:12:1:.:.:0,1,1 0/0:3,0:3:9:.:.:0,9,113 0/0:25,0:25:75:0|1:3737678_ACTGCCC_A:0,75,1101 ./.:0,0:0 0/0:13,0:13:9:.:.:0,9,135 0/0:14,0:14:9:.:.:0,9,135
    ERROR ------------------------------------------------------------------------------------------
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @jalves Can you check whether QD is annotated in the rest of the records, or is missing everywhere? Also, which version are you using?

  • JCGrenierJCGrenier Montreal, QCMember

    @jalves I was not able to solve it in the end. Will have to give it a better look. @Geraldine_VdAuwera and @jacobhsu, concerning the "-resource:omni,known=false,training=true,truth=true,prior=12.0", it won't change anything because that step is only useful for the recalibration part. The problem is happening before that step.

  • JCGrenierJCGrenier Montreal, QCMember

    Thanks @Geraldine_VdAuwera for that quick answer!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @jalves I just remembered also that there is an option in VariantsToTable to allow missing annotation values (which then get replaced by "NA"). Look up --allowMissingData in the tool doc.

  • jalvesjalves Cambridge, UKMember

    Thanks a lot for the help and comments @Geraldine_VdAuwera and @JCGrenier, it was very useful.

    It is probably not as relevant now, but I am using version 3.3.0 and indeed the lack of the QD field only happens for some of the variants.

  • slamentslament SwedenMember

    Hi! I think I am having the same problem and I was wondering if there is a solution already. I have something like 14 sites without QD field and I got a warning when trying to filter using VariantFiltration looking like this:

    WARN  23:56:50,521 Interpreter - ![0,2]: 'QD < 2.0 || FS > 60.0 || MQ < 40.0;' undefined variable QD
    WARN  23:56:50,522 Interpreter - ![0,2]: 'QD < 2.0 || FS > 60.0 || MQ < 40.0;' undefined variable QD
    WARN  23:56:52,340 Interpreter - ![0,2]: 'QD < 2.0 || FS > 60.0 || MQ < 40.0;' undefined variable QD
    WARN  23:56:53,238 Interpreter - ![0,2]: 'QD < 2.0 || FS > 60.0 || MQ < 40.0;' undefined variable QD
    WARN  23:56:54,115 Interpreter - ![0,2]: 'QD < 2.0 || FS > 60.0 || MQ < 40.0;' undefined variable QD
    WARN  23:56:54,333 Interpreter - ![0,2]: 'QD < 2.0 || FS > 60.0 || MQ < 40.0;' undefined variable QD
    

    I am using GATK v. 3.5.0 and my samples are haploid. A few examples of the problematic sites:

    chromosome_1    7780968 .   G   A   245.85  PASS    AC=1;AF=0.029;AN=34;DP=5169;FS=0;MLEAC=1;MLEAF=0.029;MQ=37.34;SOR=0.693 GT:AD:DP:GQ:PL  0:149,0:149:99:0,845    0:164,0:164:99:0,1800   0:136,0:136:99:0,1421   0:165,0:165:99:0,774    0:167,0:167:99:0,1800   0:177,0:177:99:0,1800   0:196,0:196:99:0,1800   0:189,0:189:99:0,1800   0:174,0:174:99:0,1800   0:177,0:177:99:0,1800   0:165,0:165:99:0,1800   0:131,0:131:99:0,1800   0:154,0:154:99:0,1724   0:193,0:193:99:0,395    0:200,0:200:99:0,716    1:0,0:0:99:285,0    0:88,0:88:99:0,389  0:154,0:154:99:0,369    0:159,0:159:99:0,1247   0:136,0:136:99:0,1225   0:103,0:103:99:0,118    0:135,0:135:99:0,1369   0:178,0:178:99:0,1800   0:141,0:141:99:0,119    0:188,0:188:99:0,480    0:148,0:148:99:0,1800   0:119,0:119:99:0,1800   0:154,0:154:99:0,1800   0:99,0:99:99:0,1800 0:192,0:192:99:0,1048   0:159,0:159:99:0,939    0:147,0:147:99:0,417    0:118,0:118:99:0,1533   0:163,0:163:99:0,1800
    chromosome_3    3697990 .   A   C   107.12  PASS    AC=1;AF=0.063;AN=16;DP=1094;FS=0;MLEAC=1;MLEAF=0.063;MQ=31.19;SOR=0.693 GT:AD:DP:GQ:PL  0:55,0:55:99:0,180  0:72,0:72:99:0,315  0:61,0:61:99:0,180  0:71,0:71:99:0,405  .:0,0:.:.:. 0:77,0:77:99:0,1800 0:82,0:82:99:0,1755 1:0,0:0:99:143,0    0:18,0:18:45:0,45   0:8,0:8:45:0,45 .:0,0:.:.:. 0:16,0:16:45:0,45   .:0,0:.:.:. 0:114,0:114:99:0,1584   0:158,0:158:99:0,1241   .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. 0:120,0:120:99:0,1438   0:140,0:140:99:0,521    0:9,0:9:45:0,45 .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. 0:61,0:61:99:0,450  .:0,0:.:.:. .:0,0:.:.:.
    chromosome_4    3759719 .   T   C   142.02  PASS    AC=1;AF=0.038;AN=26;DP=2653;FS=0;MLEAC=1;MLEAF=0.038;MQ=25.22;SOR=0.693 GT:AD:DP:GQ:PL  0:96,0:96:99:0,106  0:111,0:111:99:0,938    0:89,0:89:99:0,409  0:105,0:105:99:0,706    0:111,0:111:99:0,1800   0:20,0:20:99:0,450  0:24,0:24:99:0,643  0:129,0:129:99:0,1800   0:101,0:101:99:0,1800   0:131,0:131:99:0,1800   0:140,0:140:99:0,1800   0:127,0:127:99:0,1800   0:122,0:122:99:0,1800   .:0,0:.:.:. .:0,0:.:.:. 0:153,0:153:99:0,1800   0:113,0:113:99:0,1800   .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. .:0,0:.:.:. 0:84,0:84:99:0,1598 .:0,0:.:.:. 0:11,0:11:99:0,282  0:6,0:6:86:0,86 0:124,0:124:99:0,1800   0:97,0:97:99:0,1800 .:0,0:.:.:. 0:104,0:104:99:0,1800   0:118,0:118:99:0,1800   0:122,0:122:99:0,1800   0:77,0:77:99:0,183  0:119,0:119:99:0,1800   1:0,0:0:99:180,0
    chromosome_5    3385055 .   C   T   70.4    PASS    AC=1;AF=0.033;AN=30;DP=497;FS=0;MLEAC=1;MLEAF=0.033;MQ=21;SOR=0.693 GT:AD:DP:GQ:PL  0:9,0:9:99:0,135    0:6,0:6:99:0,135    0:11,0:11:99:0,135  0:5,0:5:99:0,135    0:7,0:7:99:0,135    0:9,0:9:99:0,13 0:10,0:10:99:0,114  0:5,0:5:99:0,135    0:8,0:8:99:0,203    0:48,0:48:99:0,1781 0:53,0:53:99:0,1755 0:30,0:30:99:0,1035 0:35,0:35:99:0,992  0:51,0:51:99:0,643  0:36,0:36:99:0,377  .:0,0:.:.:. .:0,0:.:.:. 0:2,0:2:85:0,85 0:1,0:1:20:0,20 .:0,0:.:.:. 0:1,0:1:43:0,43 0:8,0:8:99:0,135    0:3,0:3:99:0,107    0:17,0:17:99:0,135  0:25,0:25:99:0,270  0:11,0:11:99:0,270  0:9,0:9:99:0,152    0:5,0:5:99:0,132    0:6,0:6:99:0,211    1:0,0:0:99:109,0    .:0,0:.:.:. 0:10,0:10:99:0,163  0:24,0:24:99:0,160  0:51,0:51:99:0,658
    chromosome_5    4734163 .   G   A   50.85   PASS    AC=1;AF=0.029;AN=34;DP=1461;FS=0;MLEAC=1;MLEAF=0.029;MQ=20.67;SOR=0.693 GT:AD:DP:GQ:PL  0:48,0:48:99:0,565  0:68,0:68:99:0,442  0:49,0:49:99:0,450  0:60,0:60:99:0,418  0:46,0:46:99:0,835  0:67,0:67:99:0,1385 0:80,0:80:99:0,1154 0:28,0:28:99:0,1062 0:17,0:17:99:0,100  0:17,0:17:99:0,439  0:17,0:17:99:0,707  0:45,0:45:99:0,910  0:56,0:56:99:0,206  0:20,0:20:99:0,450  0:22,0:22:99:0,184  0:144,0:144:99:0,1800   0:112,0:112:99:0,1058   0:38,0:38:99:0,819  0:36,0:36:99:0,287  0:23,0:23:99:0,238  0:18,0:18:99:0,521  0:54,0:54:99:0,376  0:31,0:31:99:0,777  0:30,0:30:99:0,1041 0:22,0:22:99:0,357  0:32,0:32:99:0,1305 0:27,0:27:99:0,1085 0:43,0:43:99:0,617  0:30,0:30:99:0,1240 0:48,0:48:99:0,1108 0:39,0:39:99:0,668  0:48,0:48:99:0,773  0:43,0:43:99:0,1370 1:0,0:0:90:90,0
    

    Like you mentioned above, I see that for each site one of the samples has an AD of 0,0. The corresponding sites in the gvcf file are:

    chromosome_1    7780968 .   G   A,<NON_REF> 255 .   DP=51;MLEAC=1,0;MLEAF=1.00,0.00;MQ=37.34    GT:AD:DP:GQ:PL:SB   1:0,0,0:0:99:285,0,285:0,0,0,0
    chromosome_3    3697990 .   A   C,<NON_REF> 113 .   DP=31;MLEAC=1,0;MLEAF=1.00,0.00;MQ=31.19    GT:AD:DP:GQ:PL:SB   1:0,0,0:0:99:143,0,143:0,0,0,0
    chromosome_4    3759719 .   T   C,<NON_REF> 150 .   DP=18;MLEAC=1,0;MLEAF=1.00,0.00;MQ=25.22    GT:AD:DP:GQ:PL:SB   1:0,0,0:0:99:180,0,180:0,0,0,0
    chromosome_5    3385055 .   C   T,<NON_REF> 79  .   DP=1;MLEAC=1,0;MLEAF=1.00,0.00;MQ=21.00 GT:AD:DP:GQ:PL:SB   1:0,0,0:0:99:109,0,109:0,0,0,0
    chromosome_5    4734163 .   G   A,<NON_REF> 60  .   DP=3;MLEAC=1,0;MLEAF=1.00,0.00;MQ=20.67 GT:AD:DP:GQ:PL:SB   1:0,0,0:0:90:90,0,90:0,0,0,0
    

    What should I do to fix this? Thanks in advance!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @slament
    Hi,

    There is no need to worry about the WARN statements. Those are just there to let you know some of the sites are missing the annotation. You can ignore them.

    As for the AD field containing 0,0 have a look at this document for more information.

    -Sheila

  • slamentslament SwedenMember

    Thank you for your answer Sheila! I read the article you provided but it is not clear to me how both the AD and DP would be 0 (like in all cases above) and why would such sites lack a QD field. Is this related to another very similar question I posted here? In that case, tho, there is a QD field reported.

    Thank you again!

Sign In or Register to comment.