Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Unified Genotyper quality call

Hi,
I am running the unified genotyper in "EMIT_ALL_CONFIDENT_SITES" with the latest version of GATK and am getting some odd quality assignments of 10000030. It is clear that these sites are not very stellar, so I am wondering what is happening to these sites. Any thoughts?
Below is a snippet from one such region, starting with some seemingly "normal" rows and ending with this seemingly odd assessment of quality. Any thoughts? As some background info, this sample is quite different than reference, so not abnormal to have lots of missing regions when aligned against the reference that I used. It is bacterial. Here is the command line that I ran:
GenomeAnalysisTK.jar -T UnifiedGenotyper -dt NONE -I "$i" -R ../../Ref.fasta -baq RECALCULATE -nt 4 -ploidy 1 -out_mode EMIT_ALL_CONFIDENT_SITES -o "$j".vcf -stand_call_conf 100 -stand_emit_conf 100 -mbq 20 Ref 930 . C . 102 . AN=1;DP=2;MQ=150.00;MQ0=0 GT:DP:MLPSAC:MLPSAF 0:2 Ref 931 . C . 107 . AN=1;DP=2;MQ=150.00;MQ0=0 GT:DP:MLPSAC:MLPSAF 0:2 Ref 932 . T . 105 . AN=1;DP=2;MQ=150.00;MQ0=0 GT:DP:MLPSAC:MLPSAF 0:2 Ref 942 . A . 10000030 . DP=1;MQ=150.00;MQ0=0 GT . Ref 1109 . G . 10000030 . DP=2;MQ=150.00;MQ0=0 GT . Ref 1110 . A . 10000030 . DP=2;MQ=150.00;MQ0=0 GT . Ref 1111 . T . 10000030 . DP=2;MQ=150.00;MQ0=0 GT . Ref 1112 . T . 10000030 . DP=2;MQ=150.00;MQ0=0 GT . Ref 1113 . T . 10000030 . DP=2;MQ=150.00;MQ0=0 GT . Ref 1114 . A . 10000030 . DP=2;MQ=150.00;MQ0=0 GT . Ref 1115 . C . 10000030 . DP=2;MQ=150.00;MQ0=0 GT .
Post edited by Geraldine_VdAuwera on
Tagged:
Answers
Hi there,
That does look weird. Could you please run again with the latest nightly build? We've made some recent improvements to how the qualities are calculated, so it would be informative to see what becomes of these calls with the new code.
Sure. I'll let you know either way by end of week. I can save associated files on the off-chance it gets to that stage.
John
Great, thanks John. You may want to run on just a subset of the data using the
-L
argument, for testing purposes before doing a full run.Overestimated the time on the off chance I got sidetracked. In the meantime, I started and finished! Unfortunately, it looks the same. Here is the same command and then the region in question. Sorry - I realize the formatting probably gets messed up when pasting, but hopefully you have looked at them enough such that it's not too difficult on your eyes.
Hah, yes, we get pretty good at parsing this stuff by eye even with messed up formatting
But for the comfort of all I can easily fix it.
Awesome...thanks!!
Let's get a BAM snippet and put it in Pivotal to be fixed. Looks like a numerical problem with the reference calculation.
Yep -- John, could you please upload a snippet of the file that reproduces the error? We'll also need your reference files since they are non human. Detailed instructions are in the FAQs if you need them. http://www.broadinstitute.org/gatk/guide/article?id=1894
Thanks!
Is there a convenient way to create a snippet file without losing any pertinent information to those regions? And I assume I need to reindex after creating the snippet bam? Let me know and I can get you those info ASAP
nevermind. i overlooked it in your link. Sorry. will send the info soon.
Perfect, thanks.
I uploaded the files. Hopefully they are in order and correctly uploaded. They are in folder called, "ForGeraldineFromJohnG". Let me know if there are any problems!
Thanks John, I'll get back to you with any updates/follow-up.
k. thanks for your help.
Oh, can you give me the exact interval corresponding to the snippet?
900-1900. This region contains seemingly normal quality assessments at the beginning and end of this region. 1109 - 1549 was the odd region.
Out of curiosity, I was writing to see if you think you're any closer to nailing the cause of this issue? Or does it seem larger than expected?
Hi John,
We've identified the cause of the problem (a degenerate case of the higher ploidy calculation) and I believe that there is a proposed fix. We're just waiting for the developer in charge of this to find the time to get to this (he has a lot on his plate right now). But I do think it'll be done by the time we release 2.5, which is coming soon.
Hi Geraldine,
Thanks for the update. Sounds great, but won't raise hell if it happens to not make it into 2.5.
Hi
Is this issue fixed? I am getting similar quality values for my data. I am using version 3.4-46-gbc02625 of gatk.
thanks
@maziz2
Hi,
We don't recommend UnifiedGenotyper anymore. We recommend using HaplotypeCaller as it makes better indel calls.
-Sheila