We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
High Depth and GQ of 0

GATK Gurus,
I am attempting to get an "expected quality (GQ)" given a certain depth of coverage in a VCF file in order to quantify a rough minimum depth that will result in a usable variant call, and I noticed some odd behavior. In one of my variants, I have a depth of 601 (all referent), but the GQ is reported as 0. For a SNP, the PL is typically reported as "0,0,1000+". Below is a sample of the issue at 16:55862791 (note that the filter applied here is GQ>50):
Format: GT:AB:AD:DP:FT:GQ:PL
Entry: 0/0:.:601,0:601:LowQualCall:0:0,0,13743
GVCF entry: ... T . . END=55862791 GT:DP:GQ:MIN_DP:PL 0/0:601:0:601:0,0,13743
For the record, I am using the HaplotypeCaller version 3.2-0 (calling GVCFs). If you need anything else from me to help diagnose the issue, please let me know.
-John Wallace
Best Answer
-
Sheila Broad Institute admin
Hi John,
One of our developers has taken the time and figured out what is going on here.
There are 3 issues that contribute to this variant not being called:
1) There are a lot of SNPs in a small region (which could be a mapping issue)
2) The reference is non-repetitive so Haplotype Caller uses a small kmer size
3) The allele balance is skewed towards the ref so the haplotype with all alts appears unlikely.
To get the variant call:
1) You can try increasing the k-mer size to a value greater than 25. (35 worked for us)
https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--kmerSize2) You can try increasing the number of haplotypes taken into consideration to a value greater than 128. (400 or 500 worked for me) https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--maxNumHaplotypesInPopulation
-Sheila
Answers
@johnwallace123
Hi John Wallace,
Please try the latest nightly build here: https://www.broadinstitute.org/gatk/nightly
This may be a bug that was fixed recently.
-Sheila
@johnwallace123
Hi again,
Just to clarify, this was not exactly a bug. The version you are using presents the correct information, just not the information you want to see that explains the call.
I suggested you use the latest nightly build, because this may be related to read reassembly. The PLs clearly show that the caller has no confidence between hom-ref and het, and the GQ matches that. The question is why the DP looks like that. The version you are using gave the DP before reassembly, so it's possible that the site looks different after reassembly.
Please read about reassembly here: https://www.broadinstitute.org/gatk/guide/article?id=4146
-Sheila
@Shiela
I'll definitely try the nightly build to see if it fixes anything.
I was just a little surprised that given what appear to be 601 REF reads, that it was a coin flip between hom-ref and het, while the probability of hom-alt was 10^-1374. If it were truly difficult to determine, I would expect a PL that had a low, but nonzero entry, with the hom-alt also scaling likewise. Something along the lines of "0,2,50".
Thanks!
@Sheila ,
The nightly build still shows this behavior. From the GVCF, it looks like there are similar non-variants nearby which have a "PL" of 0,120,1800. Do you know why I might be seeing this behavior sometimes?
Below is a snippet of the location in the GVCF, +/- 1 line
16 55862763 . G . . END=55862790 GT:DP:GQ:MIN_DP:PL 0/0:597:99:594:0,120,1800
16 55862791 . T . . END=55862791 GT:DP:GQ:MIN_DP:PL 0/0:602:0:602:0,0,13648
16 55862792 . G . . END=55862823 GT:DP:GQ:MIN_DP:PL 0/0:603:99:596:0,120,1800
@johnwallace123
Hi,
Have you checked the -bamout file in IGV? I am wondering if the reads and bases have low quality scores around that position.
If that is not the case, I will need you to submit some files for me to test.
-Sheila
@Sheila
I took a look at the BAM from the HC in IGV for both the problematic sample and a sample that has a variant called at that position, and I didn't see anything glaring. Most of the base qualities were in the 20-35 range, and the read counts at that position were as follows (ref is "T", rs114632091 says should be a T/C SNP):
Problematic: T: 540 / C: 120
Called (1/0): T: 379 / C: 111 / G: 2 / A: 1
I also attempted to look at a negative control, but the output BAM didn't seem to have any coverage in that area.
Thanks for all the help trying to track this down. I apologize for my lack of depth in this area; this is the first time I'm looking at raw BAMs. If you have the time, I'd be happy to submit any files that you would need.
Thanks again!
@johnwallace123
Hello,
Can you upload your files for us to look at? If so, upload instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report
Thanks,
Sheila
@Sheila
Sorry it took me so long to get back to you. I've uploaded "HighDP_LowGQ.tar.gz" that should include everything needed to take a look at. If you need any additional information or files, please let me know and I'd be happy to provide them.
Thanks so much for your help!
-John
@johnwallace123
Hi John,
I see the issue is that although it seems like there are 600 reference reads and 0 non-reference reads, there are actually 102 non-reference reads and 523 reference reads. I am not sure why the variant is not getting called, however.
I just submitted a bug report and will let you know when it is fixed.
-Sheila
@johnwallace123
Hi John,
One of our developers has taken the time and figured out what is going on here.
There are 3 issues that contribute to this variant not being called:
1) There are a lot of SNPs in a small region (which could be a mapping issue)
2) The reference is non-repetitive so Haplotype Caller uses a small kmer size
3) The allele balance is skewed towards the ref so the haplotype with all alts appears unlikely.
To get the variant call:
1) You can try increasing the k-mer size to a value greater than 25. (35 worked for us)
https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--kmerSize
2) You can try increasing the number of haplotypes taken into consideration to a value greater than 128. (400 or 500 worked for me) https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--maxNumHaplotypesInPopulation
-Sheila