The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

# Unified Genotyper quality call

Member Posts: 17
edited April 2013

Hi,

I am running the unified genotyper in "EMIT_ALL_CONFIDENT_SITES" with the latest version of GATK and am getting some odd quality assignments of 10000030. It is clear that these sites are not very stellar, so I am wondering what is happening to these sites. Any thoughts?

Below is a snippet from one such region, starting with some seemingly "normal" rows and ending with this seemingly odd assessment of quality. Any thoughts? As some background info, this sample is quite different than reference, so not abnormal to have lots of missing regions when aligned against the reference that I used. It is bacterial. Here is the command line that I ran:

GenomeAnalysisTK.jar -T UnifiedGenotyper -dt NONE -I "$i" -R ../../Ref.fasta -baq RECALCULATE -nt 4 -ploidy 1 -out_mode EMIT_ALL_CONFIDENT_SITES -o "$j".vcf -stand_call_conf 100 -stand_emit_conf 100 -mbq 20

Ref  930 .   C   .   102 .   AN=1;DP=2;MQ=150.00;MQ0=0   GT:DP:MLPSAC:MLPSAF 0:2
Ref  931 .   C   .   107 .   AN=1;DP=2;MQ=150.00;MQ0=0   GT:DP:MLPSAC:MLPSAF 0:2
Ref  932 .   T   .   105 .   AN=1;DP=2;MQ=150.00;MQ0=0   GT:DP:MLPSAC:MLPSAF 0:2
Ref  942 .   A   .   10000030    .   DP=1;MQ=150.00;MQ0=0    GT  .
Ref  1109    .   G   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1110    .   A   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1111    .   T   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1112    .   T   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1113    .   T   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1114    .   A   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1115    .   C   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .

Post edited by Geraldine_VdAuwera on
Tagged:

Hi there,

That does look weird. Could you please run again with the latest nightly build? We've made some recent improvements to how the qualities are calculated, so it would be informative to see what becomes of these calls with the new code.

Geraldine Van der Auwera, PhD

• Member Posts: 17

Sure. I'll let you know either way by end of week. I can save associated files on the off-chance it gets to that stage.

John

Great, thanks John. You may want to run on just a subset of the data using the -L argument, for testing purposes before doing a full run.

Geraldine Van der Auwera, PhD

• Member Posts: 17
edited April 2013

Overestimated the time on the off chance I got sidetracked. In the meantime, I started and finished! Unfortunately, it looks the same. Here is the same command and then the region in question. Sorry - I realize the formatting probably gets messed up when pasting, but hopefully you have looked at them enough such that it's not too difficult on your eyes.

java -jar -Xmx5G ~/bin/tools/GenomeAnalysisTK-nightly-2013-04-10-gb25bc5b/GenomeAnalysisTK.jar -T UnifiedGenotyper -dt NONE -I InputFile.bam -R ../../Ref.fasta -baq RECALCULATE -nt 4 -ploidy 1 -out_mode EMIT_ALL_CONFIDENT_SITES -o testNightlyBuild -stand_call_conf 100 -stand_emit_conf 100 -mbq 20

Ref  930 .   C   .   102 .   AN=1;DP=2;MQ=150.00;MQ0=0   GT:DP:MLPSAC:MLPSAF 0:2
Ref  931 .   C   .   107 .   AN=1;DP=2;MQ=150.00;MQ0=0   GT:DP:MLPSAC:MLPSAF 0:2
Ref  932 .   T   .   105 .   AN=1;DP=2;MQ=150.00;MQ0=0   GT:DP:MLPSAC:MLPSAF 0:2
Ref  942 .   A   .   10000030    .   DP=1;MQ=150.00;MQ0=0    GT  .
Ref  1109    .   G   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1110    .   A   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1111    .   T   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1112    .   T   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1113    .   T   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
Ref  1114    .   A   .   10000030    .   DP=2;MQ=150.00;MQ0=0    GT  .
`

Hah, yes, we get pretty good at parsing this stuff by eye even with messed up formatting But for the comfort of all I can easily fix it.

Geraldine Van der Auwera, PhD

• Member Posts: 17

Awesome...thanks!!

Let's get a BAM snippet and put it in Pivotal to be fixed. Looks like a numerical problem with the reference calculation.

--
Mark A. DePristo, Ph.D.
Co-Director, Medical and Population Genetics
Broad Institute of MIT and Harvard

Yep -- John, could you please upload a snippet of the file that reproduces the error? We'll also need your reference files since they are non human. Detailed instructions are in the FAQs if you need them. http://www.broadinstitute.org/gatk/guide/article?id=1894

Thanks!

Geraldine Van der Auwera, PhD

• Member Posts: 17

Is there a convenient way to create a snippet file without losing any pertinent information to those regions? And I assume I need to reindex after creating the snippet bam? Let me know and I can get you those info ASAP

• Member Posts: 17

nevermind. i overlooked it in your link. Sorry. will send the info soon.

Perfect, thanks.

Geraldine Van der Auwera, PhD

• Member Posts: 17

I uploaded the files. Hopefully they are in order and correctly uploaded. They are in folder called, "ForGeraldineFromJohnG". Let me know if there are any problems!

Thanks John, I'll get back to you with any updates/follow-up.

Geraldine Van der Auwera, PhD

• Member Posts: 17

Oh, can you give me the exact interval corresponding to the snippet?

Geraldine Van der Auwera, PhD

• Member Posts: 17

900-1900. This region contains seemingly normal quality assessments at the beginning and end of this region. 1109 - 1549 was the odd region.

• Member Posts: 17

Out of curiosity, I was writing to see if you think you're any closer to nailing the cause of this issue? Or does it seem larger than expected?

Hi John,

We've identified the cause of the problem (a degenerate case of the higher ploidy calculation) and I believe that there is a proposed fix. We're just waiting for the developer in charge of this to find the time to get to this (he has a lot on his plate right now). But I do think it'll be done by the time we release 2.5, which is coming soon.

Geraldine Van der Auwera, PhD

• Member Posts: 17

Hi Geraldine,

Thanks for the update. Sounds great, but won't raise hell if it happens to not make it into 2.5.

• tgenMember Posts: 1

Hi

Is this issue fixed? I am getting similar quality values for my data. I am using version 3.4-46-gbc02625 of gatk.

thanks