We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
Missing QD tags at some positions
Hey there,
How can it be possible that some of my snps or indels calls miss the QD tag? I'm doing the recommended workflow and I've tested if for both RNAseq (hard filters complains, that's how I saw those tags were missing) and Exome sequencing (VQSR). How can a hard filter applied on QD on a line without actually that tag can be considered to pass that threshold too? I'm seeing a lot more INDELs in RNAseq where this kind of scenario is happening as well.
Here's the command lines that I used :
Forming gvcf files like this (for RNAseq in that case):
java -Djava.io.tmpdir=$LSCRATCH -Xmx46g -jar /home/apps/Logiciels/GATK/3.2-2/GenomeAnalysisTK.jar -l INFO -pairHMM VECTOR_LOGLESS_CACHING -T HaplotypeCaller -I sample.bam -R ~/References/hg19/hg19.fasta --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o sample.g.vcf --dbsnp ~/References/hg19/dbSNP/dbsnp_138.b37.excluding_sites_after_129.CEU_converted.vcf -recoverDanglingHeads -dontUseSoftClippedBases -stand_call_conf 20.0 -stand_emit_conf 0.0 -mmq 1 -U ALLOW_N_CIGAR_READS
GenotypeGVCF after :
java -Djava.io.tmpdir=$LSCRATCH -Xmx22g -jar /home/apps/Logiciels/GATK/3.2-2/GenomeAnalysisTK.jar -l INFO -T GenotypeGVCFs --dbsnp ~/References/hg19/dbSNP/dbsnp_138.b37.excluding_sites_after_129.CEU_converted.vcf -A QualByDepth -A FisherStrand -o Joint_file/96RNAseq.10092014.STAR.q1.1kb.vcf -R ~/References/hg19/hg19.fasta
VQSR (separated for indels and snvs) or Hard filter at the end :
VQSR
java -Djava.io.tmpdir=$LSCRATCH -Xmx10g -jar /home/apps/Logiciels/GATK/3.2-2/GenomeAnalysisTK.jar -l INFO -T VariantRecalibrator -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -mode SNP -resource:1000G,known=false,training=true,truth=false,prior=10.0 ~/References/hg19/VQSR/1000G_phase1.snps.high_confidence.b37.noGL.converted.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ~/References/hg19/VQSR/hapmap_3.3.b37.noGL.nochrM.converted.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 ~/References/hg19/VQSR/1000G_omni2.5.b37.noGL.nochrM.converted.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 ~/References/hg19/VQSR/dbsnp_138.b37.excluding_sites_after_129.noGL.nochrM.converted.vcf -input snv.vcf -recalFile 96Exomes.HC.TruSeq.snv.RECALIBRATED -tranchesFile 96Exomes.HC.TruSeq.snv.tranches -rscriptFile 96Exomes.HC.TruSeq.snv.plots.R -R ~/References/hg19/hg19.fasta --maxGaussians 4
java -Djava.io.tmpdir=$LSCRATCH -Xmx10g -jar /home/apps/Logiciels/GATK/3.2-2/GenomeAnalysisTK.jar -l INFO -T ApplyRecalibration -ts_filter_level 99.0 -mode SNP -input snv.vcf -recalFile 96Exomes.HC.TruSeq.snv.RECALIBRATED -tranchesFile 96Exomes.HC.TruSeq.snv.tranches -o 96Exomes.HC.TruSeq.snv.recal.vcf -R ~/References/hg19/hg19.fasta
HARD FILTER (RNASeq)
java -Djava.io.tmpdir=$LSCRATCH -Xmx2g -jar /home/apps/Logiciels/GATK/3.1-1/GenomeAnalysisTK.jar -l INFO -T VariantFiltration -R ~/References/hg19/hg19.fasta -V 96RNAseq.STAR.q1.vcf -window 35 -cluster 3 -filterName FS -filter "FS > 30.0" -filterName QD -filter "QD < 2.0" -o 96RNAseq.STAR.q1.FS30.QD2.vcf
Here are some examples for RNAseq :
chr1 6711349 . G A 79.10 PASS BaseQRankSum=-1.369e+00;ClippingRankSum=1.00;DP=635;FS=1.871;MLEAC=1;MLEAF=5.495e-03;MQ=60.00;MQ0=0;MQRankSum=-4.560e-01;ReadPosRankSum=-1.187e+00 GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,280 ./.:0,0:0 0/0:9,0:9:21:0,21,248 0/0:7,0:7:21:0,21,196 0/0:7,0:7:21:0,21,226 0/0:8,0:8:21:0,21,227 0/0:8,0:8:21:0,21,253 0/0:7,0:7:21:0,21,218 0/0:9,0:9:27:0,27,282 1/1:0,0:5:15:137,15,0 0/0:2,0:2:6:0,6,47 0/0:28,0:28:78:0,78,860 0/0:7,0:7:21:0,21,252 0/0:2,0:2:6:0,6,49 0/0:5,0:5:12:0,12,152 0/0:3,0:3:6:0,6,90 0/0:4,0:4:12:0,12,126 0/0:9,0:9:21:0,21,315 0/0:7,0:7:21:0,21,256 0/0:7,0:7:21:0,21,160 0/0:8,0:8:21:0,21,298 0/0:20,0:20:60:0,60,605 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,67 0/0:2,0:2:6:0,6,71 0/0:14,0:14:20:0,20,390 0/0:7,0:7:21:0,21,223 0/0:7,0:7:21:0,21,221 0/0:4,0:4:12:0,12,134 0/0:2,0:2:6:0,6,54 ./.:0,0:0 0/0:4,0:4:9:0,9,118 0/0:8,0:8:21:0,21,243 0/0:6,0:6:15:0,15,143 0/0:8,0:8:21:0,21,244 0/0:7,0:7:21:0,21,192 0/0:2,0:2:6:0,6,54 0/0:13,0:13:27:0,27,359 0/0:8,0:8:21:0,21,245 0/0:7,0:7:21:0,21,218 0/0:12,0:12:36:0,36,354 0/0:8,0:8:21:0,21,315 0/0:7,0:7:21:0,21,215 0/0:2,0:2:6:0,6,49 0/0:10,0:10:24:0,24,301 0/0:7,0:7:21:0,21,208 0/0:7,0:7:21:0,21,199 0/0:2,0:2:6:0,6,47 0/0:3,0:3:9:0,9,87 0/0:2,0:2:6:0,6,73 0/0:7,0:7:21:0,21,210 0/0:8,0:8:22:0,22,268 0/0:7,0:7:21:0,21,184 0/0:7,0:7:21:0,21,213 0/0:5,0:5:9:0,9,135 0/0:7,0:7:21:0,21,200
0/0:4,0:4:12:0,12,118 0/0:7,0:7:21:0,21,232 0/0:7,0:7:21:0,21,232 0/0:7,0:7:21:0,21,217 0/0:8,0:8:21:0,21,255 0/0:9,0:9:24:0,24,314 0/0:8,0:8:21:0,21,221 0/0:9,0:9:24:0,24,276 0/0:9,0:9:21:0,21,285 0/0:3,0:3:6:0,6,90 0/0:2,0:2:6:0,6,57 0/0:13,0:13:20:0,20,385 0/0:2,0:2:6:0,6,48 0/0:11,0:11:27:0,27,317 0/0:8,0:8:21:0,21,315 0/0:9,0:9:24:0,24,284 0/0:7,0:7:21:0,21,228 0/0:14,0:14:33:0,33,446 0/0:2,0:2:6:0,6,64 0/0:2,0:2:6:0,6,72 0/0:7,0:7:21:0,21,258 0/0:10,0:10:27:0,27,348 0/0:7,0:7:21:0,21,219 0/0:9,0:9:21:0,21,289 0/0:20,0:20:57:0,57,855 0/0:4,0:4:12:0,12,146 0/0:7,0:7:21:0,21,205 0/0:12,0:14:36:0,36,1030 0/0:3,0:3:6:0,6,87
0/0:2,0:2:6:0,6,60 0/0:7,0:7:21:0,21,226 0/0:7,0:7:21:0,21,229 0/0:8,0:8:21:0,21,265 0/0:4,0:4:6:0,6,90 ./.:0,0:0
0/0:7,0:7:21:0,21,229 0/0:2,0:2:6:0,6,59 0/0:2,0:2:6:0,6,56
chr1 7992047 . T C 45.83 SnpCluster BaseQRankSum=1.03;ClippingRankSum=0.00;DP=98;FS=0.000;MLEAC=1;MLEAF=0.014;MQ=60.00;MQ0=0;MQRankSum=-1.026e+00;ReadPosRankSum=-1.026e+00 GT:AD:DP:GQ:PL ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,70 0/0:2,0:2:6:0,6,45 0/0:3,0:3:6:0,6,87 0/0:2,0:2:6:0,6,52 ./.:0,0:0 ./.:0,0:0
./.:1,0:1 ./.:0,0:0 0/0:2,0:2:6:0,6,55 0/0:2,0:2:6:0,6,49 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,61
0/0:2,0:2:6:0,6,49 ./.:0,0:0 ./.:0,0:0 0/0:3,0:3:6:0,6,90 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,52
./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,69 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,64 ./.:0,0:0 0/0:2,0:2:6:0,6,37 ./.:0,0:0
0/0:2,0:2:6:0,6,67 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,68 ./.:0,0:0 ./.:0,0:0
./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:11,0:11:24:0,24,360 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,68 0/0:2,0:2:6:0,6,50 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0
./.:0,0:0 0/0:2,0:2:6:0,6,50 0/0:3,0:3:6:0,6,90 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:4:6:0,6,50
./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:7,0:7:21:0,21,231 0/0:2,0:2:6:0,6,64 ./.:0,0:0 0/0:2,0:2:6:0,6,63
0/0:2,0:2:6:0,6,70 ./.:0,0:0 0/0:6,0:6:15:0,15,148 ./.:0,0:0 ./.:0,0:0 1/1:0,0:2:6:90,6,0 ./.:0,0:0
0/0:2,0:2:6:0,6,63 0/0:2,0:2:6:0,6,74 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,58
0/0:2,0:2:6:0,6,71 ./.:0,0:0 ./.:0,0:0 0/0:2,0:2:6:0,6,49
For Exome Seq now :
chr2 111878571 . C T 93.21 PASS DP=634;FS=0.000;MLEAC=1;MLEAF=5.319e-03;MQ=60.00;MQ0=0;VQSLOD=14.19;culprit=MQ GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,243 0/0:4,0:4:9:0,9,135 0/0:7,0:7:18:0,18,270 0/0:7,0:7:21:0,21,230 0/0:16,0:16:48:0,48,542 0/0:8,0:8:21:0,21,315 0/0:6,0:6:18:0,18,186 0/0:5,0:5:15:0,15,168 0/0:6,0:6:15:0,15,225 0/0:10,0:10:30:0,30,333 0/0:7,0:7:21:0,21,239
0/0:6,0:6:18:0,18,202 0/0:6,0:6:15:0,15,225 0/0:7,0:7:21:0,21,225 0/0:8,0:8:24:0,24,272 0/0:5,0:5:15:0,15,168 1/1:0,0:13:13:147,13,0 0/0:2,0:2:6:0,6,73 0/0:8,0:8:24:0,24,256 0/0:14,0:14:4:0,4,437 0/0:3,0:3:9:0,9,85 0/0:4,0:4:12:0,12,159 0/0:7,0:7:21:0,21,238 0/0:5,0:5:15:0,15,195 0/0:7,0:7:15:0,15,225 0/0:12,0:12:36:0,36,414 0/0:4,0:4:12:0,12,156 0/0:7,0:7:0:0,0,190 0/0:2,0:2:6:0,6,64 0/0:7,0:7:21:0,21,242 0/0:7,0:7:21:0,21,234 0/0:8,0:8:24:0,24,267 0/0:7,0:7:21:0,21,245 0/0:7,0:7:21:0,21,261 0/0:6,0:6:18:0,18,204 0/0:8,0:8:24:0,24,302 0/0:5,0:5:15:0,15,172 0/0:9,0:9:24:0,24,360 0/0:18,0:18:51:0,51,649 0/0:5,0:5:15:0,15,176 0/0:2,0:2:6:0,6,70 0/0:14,0:14:33:0,33,495 0/0:4,0:4:9:0,9,135 0/0:8,0:8:21:0,21,315 0/0:4,0:4:12:0,12,149 0/0:4,0:4:6:0,6,90 0/0:10,0:10:27:0,27,405 0/0:3,0:3:6:0,6,90 0/0:4,0:4:12:0,12,133 0/0:14,0:14:6:0,6,431 0/0:4,0:4:12:0,12,151 0/0:5,0:5:15:0,15,163 0/0:3,0:3:9:0,9,106 0/0:7,0:7:21:0,21,237 0/0:7,0:7:21:0,21,268 0/0:8,0:8:21:0,21,315 0/0:2,0:2:6:0,6,68 ./.:0,0:0 0/0:3,0:3:9:0,9,103 0/0:7,0:7:21:0,21,230 0/0:3,0:3:6:0,6,90 0/0:9,0:9:26:0,26,277 0/0:7,0:7:21:0,21,236 0/0:5,0:5:15:0,15,170 ./.:1,0:1 0/0:15,0:15:45:0,45,653 0/0:8,0:8:24:0,24,304 0/0:6,0:6:15:0,15,225 0/0:3,0:3:9:0,9,103 0/0:2,0:2:6:0,6,79 0/0:7,0:7:21:0,21,241 0/0:4,0:4:12:0,12,134 0/0:3,0:3:6:0,6,90 0/0:5,0:5:15:0,15,159 0/0:4,0:4:12:0,12,136 0/0:5,0:5:12:0,12,180 0/0:11,0:11:21:0,21,315 0/0:13,0:13:39:0,39,501 0/0:3,0:3:9:0,9,103 0/0:8,0:8:24:0,24,257 0/0:2,0:2:6:0,6,73 0/0:8,0:8:24:0,24,280 0/0:4,0:4:12:0,12,144 0/0:4,0:4:9:0,9,135 0/0:8,0:8:24:0,24,298 0/0:4,0:4:12:0,12,129 0/0:5,0:5:15:0,15,184 0/0:2,0:2:6:0,6,62
0/0:2,0:2:6:0,6,65 0/0:9,0:9:27:0,27,337 0/0:7,0:7:21:0,21,230 0/0:7,0:7:21:0,21,239 0/0:5,0:5:0:0,0,113 0/0:11,0:11:33:0,33,369 0/0:7,0:7:21:0,21,248 0/0:10,0:10:30:0,30,395
Thanks for your help.
Best Answers
-
Geraldine_VdAuwera Cambridge, MA admin
@JCGrenier We did find out from discussing your case with the devs that the 0,0 ADs are actually meaningful. The reads are not getting counted toward the ADs because they are considered uninformative as to which allele they support. What needs to happen here is that the DP should be used for QD instead of the AD; we have an issue ticket to get this done.
@jalves It looks like your case is due to the same problem of a site with AD 0,0. For now you can work around this problem by skipping this site using the -XL argument.
-
Sheila Broad Institute admin
@slament
Hi,Yes, I will answer the rest of your questions in the other thread. For QD to be calculated, there have to be reads supporting the alternate allele in the AD field. Notice in your examples, the AD field has 0 for the alternate allele. That is why QD cannot be calculated.
-Sheila
Answers
Does QD show up in your GVCFs?
I don't have any QD for any line in the gvcf. However, it seems to be able to report it for the majority of the positions when doing the GenotypeGVCFs command.
Here's what I found for the exome sample where I had a non-ref call. So it seems like a deletion but it reports a SNV.
chr2 111878556 . TGCCGCCGCCGCCGCCGCCGCC T,TGCCGCCGCC,TGCCGCCGCCGCC, 345.19 . DP=19;MLEAC=0,1,1,0;MLEAF=0.00,0.500,0.500,0.00;MQ=53.34;MQ0=0 GT:AD:DP:GQ:PL:SB 2/3:0,0,8,5,0:13:99:617,536,740,205,217,178,388,301,0,368,483,496,194,279,470:0,0,5,0
Are we supposed to have QD values in the GVCF files?
Thanks
Actually no, sorry -- it's one of those fields that's left out to save space, and added by GenotypeGVCFs. Still getting used to the new workflow details on my end too.
The call you're showing is a mixed record, which shows evidence for both a SNP and an indel. FYI that is treated by VQSR as an indel. I'm wondering if it's having trouble calculating QD for mixed records. Are all the other cases where you don't have QD mixed records like this?
Not necessarily. Here's the example of the non-ref for the first example in the RNAseq analysis :
chr1 6711349 . G A, 104.03 . DP=5;MLEAC=2,0;MLEAF=1.00,0.00;MQ=60.00;MQ0=0 GT:AD:DP:GQ:PL:SB 1/1:0,5,0:5:15:137,15,0,137,15,137:0,0,2,3
OK -- and what does that site look like after GenotypeGVCFs?
Like this. Everything seems normal but there's no QD tag.
chr1 6711349 . G A 79.10 PASS BaseQRankSum=-1.369e+00;ClippingRankSum=1.00;DP=635;FS=1.871;MLEAC=1;MLEAF=5.495e-03;MQ=60.00;MQ0=0;MQRankSum=-4.560e-01;ReadPosRankSum=-1.187e+00 GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,280 ./.:0,0:0 0/0:9,0:9:21:0,21,248 0/0:7,0:7:21:0,21,196 0/0:7,0:7:21:0,21,226 0/0:8,0:8:21:0,21,227 0/0:8,0:8:21:0,21,253 0/0:7,0:7:21:0,21,218 0/0:9,0:9:27:0,27,282 1/1:0,0:5:15:137,15,0 0/0:2,0:2:6:0,6,47 0/0:28,0:28:78:0,78,860 0/0:7,0:7:21:0,21,252 0/0:2,0:2:6:0,6,49 0/0:5,0:5:12:0,12,152 0/0:3,0:3:6:0,6,90 0/0:4,0:4:12:0,12,126 0/0:9,0:9:21:0,21,315 0/0:7,0:7:21:0,21,256 0/0:7,0:7:21:0,21,160 0/0:8,0:8:21:0,21,298 0/0:20,0:20:60:0,60,605 0/0:2,0:2:6:0,6,49 0/0:2,0:2:6:0,6,67 0/0:2,0:2:6:0,6,71 0/0:14,0:14:20:0,20,390 0/0:7,0:7:21:0,21,223 0/0:7,0:7:21:0,21,221 0/0:4,0:4:12:0,12,134 0/0:2,0:2:6:0,6,54 ./.:0,0:0 0/0:4,0:4:9:0,9,118 0/0:8,0:8:21:0,21,243 0/0:6,0:6:15:0,15,143 0/0:8,0:8:21:0,21,244 0/0:7,0:7:21:0,21,192 0/0:2,0:2:6:0,6,54 0/0:13,0:13:27:0,27,359 0/0:8,0:8:21:0,21,245 0/0:7,0:7:21:0,21,218 0/0:12,0:12:36:0,36,354 0/0:8,0:8:21:0,21,315 0/0:7,0:7:21:0,21,215 0/0:2,0:2:6:0,6,49 0/0:10,0:10:24:0,24,301 0/0:7,0:7:21:0,21,208 0/0:7,0:7:21:0,21,199 0/0:2,0:2:6:0,6,47 0/0:3,0:3:9:0,9,87 0/0:2,0:2:6:0,6,73 0/0:7,0:7:21:0,21,210 0/0:8,0:8:22:0,22,268 0/0:7,0:7:21:0,21,184 0/0:7,0:7:21:0,21,213 0/0:5,0:5:9:0,9,135 0/0:7,0:7:21:0,21,200 0/0:4,0:4:12:0,12,118 0/0:7,0:7:21:0,21,232 0/0:7,0:7:21:0,21,232 0/0:7,0:7:21:0,21,217 0/0:8,0:8:21:0,21,255 0/0:9,0:9:24:0,24,314 0/0:8,0:8:21:0,21,221 0/0:9,0:9:24:0,24,276 0/0:9,0:9:21:0,21,285 0/0:3,0:3:6:0,6,90 0/0:2,0:2:6:0,6,57 0/0:13,0:13:20:0,20,385 0/0:2,0:2:6:0,6,48 0/0:11,0:11:27:0,27,317 0/0:8,0:8:21:0,21,315 0/0:9,0:9:24:0,24,284 0/0:7,0:7:21:0,21,228 0/0:14,0:14:33:0,33,446 0/0:2,0:2:6:0,6,64 0/0:2,0:2:6:0,6,72 0/0:7,0:7:21:0,21,258 0/0:10,0:10:27:0,27,348 0/0:7,0:7:21:0,21,219 0/0:9,0:9:21:0,21,289 0/0:20,0:20:57:0,57,855 0/0:4,0:4:12:0,12,146 0/0:7,0:7:21:0,21,205 0/0:12,0:14:36:0,36,1030 0/0:3,0:3:6:0,6,87 0/0:2,0:2:6:0,6,60 0/0:7,0:7:21:0,21,226 0/0:7,0:7:21:0,21,229 0/0:8,0:8:21:0,21,265 0/0:4,0:4:6:0,6,90 ./.:0,0:0 0/0:7,0:7:21:0,21,229 0/0:2,0:2:6:0,6,59 0/0:2,0:2:6:0,6,56
Hmm. I'll ask the devs if there's anything that could explain that, because it doesn't seem like the expected behavior to me.
Hi @JCGrenier
We think this might be due to a bug that sometimes affects the AD field, which is used to calculate QD. You can see that the hom-var sample has 0,0 AD in the final record, which is wrong since it was 0,5 in the GVCF record you posted earlier.
The bug is fixed in the nightly build (see Download page), so please try rerunning GenotypeGVCFs with that version and let me know what results you get. Sorry for the inconvenience!
Thanks for your help @Geraldine_VdAuwera, I will test it shortly! I'll keep you posted.
@JCGrenier and @Geraldine_VdAuwera I can't help but notice that the only variable positions in your posted VCF lines have 0,0 in their AD fields, meaning you have no ALT depth on which to calculate the QD INFO field (see, https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_QualByDepth.php)
@bredeson You're right, but it's odd that those calls would have ADs of 0,0, hence the assumption that the ADs are wrong. So the missing QDs are just a side effect of the real problem which lies with the ADs.
@Geraldine_VdAuwera and @JCGrenier
According to above discussion when JCGrenier on last September, this argument on VQSR step looks strange to me.
-resource:omni,known=false,training=true,truth=false,prior=12.0 ~/References/hg19/VQSR/1000G_omni2.5.b37.noGL.nochrM.converted.vcf
The webpage on this https://www.broadinstitute.org/gatk/guide/article?id=1259 indicates below,
-resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.sites.vcf
Maybe this is not the issue to affect the QD, but I found it's an inconsistent between the suggested workflow and what JCGrenier did.
Well spotted, @jacobhsu -- the omni sites should be treated as true.
Thanks @jacobhsu and @bredeson for your comments. I will verify all of that. I know that some things have changed between the 3.2-2 that I was using before and the version 3.3-0. From a logic point of view, those missing QD tags are probably even computed without the need of those references right? We are running the newest version of GATK right now on our dataset so I can tell you if :
1) The version is changing something
2) Changing the "truth=true" is affecting something for those calls.
Thanks a lot guys.
Hi @Geraldine_VdAuwera and @JCGrenier ,
Is there any update about this? I am having exactly the same issue. I am trying to use the tool "VariantsToTable" to export the several fields including QD field and it gives me the error below. The VCF was generated using GenotypeGVCFs and filtered using --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0". I am using version 3.3.0.
Many thanks,
Joel
ERROR MESSAGE: Missing field QD in vc variant at [VC variant @ 1:3737687 Q1463.36 of type=SNP alleles=[C*, G] attr={AC=4, AF=0.100, AN=40, BaseQRankSum=-3.470e-01, ClippingRankSum=0.116, DP=465, FS=7.698, GQ_MEAN=23.85, GQ_STDDEV=48.94, InbreedingCoeff=0.5368, MLEAC=8, MLEAF=0.200, MQ=60.00, MQ0=0, MQRankSum=-1.160e-01, NCC=6, ReadPosRankSum=-1.736e+00, SOR=0.322} GT=GT:AD:DP:GQ:PGT:PID:PL ./.:5,0:5 ./.:0,0:0 0/0:0,0:33:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:11:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:6:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:14,1:15:39:0|1:3737678_ACTGCCC_A:0,39,583 ./.:1,0:1 1/1:0,0:68:99:.:.:1157,196,0 0/0:0,0:18:1:.:.:0,1,1 ./.:1,0:1 ./.:11,0:11 0/0:18,0:18:9:.:.:0,9,135 0/0:34,0:34:99:0|1:3737678_ACTGCCC_A:0,102,1452 1/1:0,0:27:24:.:.:356,24,0 0/0:3,0:3:0:.:.:0,0,28 0/0:1,0:19:3:0|1:3737681_GCCCCCC_G:0,3,28 0/0:0,0:21:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:18:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:12:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:16:0:0|1:3737681_GCCCCCC_G:0,0,0 0/0:0,0:12:1:.:.:0,1,1 0/0:3,0:3:9:.:.:0,9,113 0/0:25,0:25:75:0|1:3737678_ACTGCCC_A:0,75,1101 ./.:0,0:0 0/0:13,0:13:9:.:.:0,9,135 0/0:14,0:14:9:.:.:0,9,135
ERROR ------------------------------------------------------------------------------------------
@jalves Can you check whether QD is annotated in the rest of the records, or is missing everywhere? Also, which version are you using?
@jalves I was not able to solve it in the end. Will have to give it a better look. @Geraldine_VdAuwera and @jacobhsu, concerning the "-resource:omni,known=false,training=true,truth=true,prior=12.0", it won't change anything because that step is only useful for the recalibration part. The problem is happening before that step.
@JCGrenier We did find out from discussing your case with the devs that the 0,0 ADs are actually meaningful. The reads are not getting counted toward the ADs because they are considered uninformative as to which allele they support. What needs to happen here is that the DP should be used for QD instead of the AD; we have an issue ticket to get this done.
@jalves It looks like your case is due to the same problem of a site with AD 0,0. For now you can work around this problem by skipping this site using the -XL argument.
Thanks @Geraldine_VdAuwera for that quick answer!
@jalves I just remembered also that there is an option in VariantsToTable to allow missing annotation values (which then get replaced by "NA"). Look up
--allowMissingData
in the tool doc.Thanks a lot for the help and comments @Geraldine_VdAuwera and @JCGrenier, it was very useful.
It is probably not as relevant now, but I am using version 3.3.0 and indeed the lack of the QD field only happens for some of the variants.
Hi! I think I am having the same problem and I was wondering if there is a solution already. I have something like 14 sites without QD field and I got a warning when trying to filter using
VariantFiltration
looking like this:I am using GATK v. 3.5.0 and my samples are haploid. A few examples of the problematic sites:
Like you mentioned above, I see that for each site one of the samples has an AD of 0,0. The corresponding sites in the gvcf file are:
What should I do to fix this? Thanks in advance!
@slament
Hi,
There is no need to worry about the WARN statements. Those are just there to let you know some of the sites are missing the annotation. You can ignore them.
As for the AD field containing 0,0 have a look at this document for more information.
-Sheila
Thank you for your answer Sheila! I read the article you provided but it is not clear to me how both the AD and DP would be 0 (like in all cases above) and why would such sites lack a QD field. Is this related to another very similar question I posted here? In that case, tho, there is a QD field reported.
Thank you again!
@slament
Hi,
Yes, I will answer the rest of your questions in the other thread. For QD to be calculated, there have to be reads supporting the alternate allele in the AD field. Notice in your examples, the AD field has 0 for the alternate allele. That is why QD cannot be calculated.
https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_QualByDepth.php
-Sheila