The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Haplotype Scoring Algorithm

PanamáPosts: 22
edited November 2012

Hi there,

I'm trying to understand the haplotype scoring algorithm in GATK 1.6.5. I fortunately got a printed page where I have a simple diagram that explains the algorithm, I can't find it anymore in the new web.
The case is that the formula for calculating the haplotype score in this diagram has a variable that I'am missing what it is. This is the formula as it's written:

P(read | haplotype_j) = sum_bi (bi == hi ? ei : 1 - ei / 3) - sum_bi (ei)
`

I guess bi stands for base at position i at the current read and hi stands base at position i at haplotype_j, that makes sense for me. But, what is ei?? maybe I'm missing something... it looks like it should be a probability in the range (0, 1) for the haplotype score to make sense.

Pablo.

Post edited by Geraldine_VdAuwera on
Tagged:

Geraldine Van der Auwera, PhD

• PanamáPosts: 22

Hi,

They are not really talking about haplotype scoring algorithm in that article. Anyway that lead me to the fragment-based SNP calling slides and they are referring to an "e" which is the sequencing error rate. May it be this?
We had ei, so it would be specific to that position and not a static sequencing error. I guess it is the error rate for position i in haplotype j, that might be the number of mismatches to consensus haplotype j at position i over the total counts for position i at haplotype j.

But I still doubt how are you calculating the error sequencing rate? Going through the documentation for ErrorRatePerCycle, the error sequencing rates calculated there do not match only to mismatches/counts.

Thanks!
Pablo.

• PanamáPosts: 22

Aaaaaaahhhh, OK. I knew it had to be something evident... that makes sense.
I just wanted to fully understand the haplotype scoring and I missed what this e was, I was thinking about calculating some error probability