We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

# Question about the alignment performed in the Haplotypecaller (pairHMM)

Member
edited December 2018
Hello.

After looking at the implementation of the pair-HMM algorithm used in the haplotype caller I was left with a question.
From the source code comments, it seems that local alignment is supposed to be used.
link to the comment on github : (note : the links are missing the 'h' of https because I am not allowed to post links for now).

This cites Figure 4.3 in Durbin 1998 book. This is the figure of the Finite State Automaton (FSA) for local alignment but when reading the implementation (of the forward algorithm) it seems that the FSA of figure 4.2 is used instead. (The model with M, X, and Y).

The forward algorithm used seems to be the one presented in section 4.2 of the book which implements Figure 4.2 (and not 4.3). This algorithm is for global alignment and not local alignment.

My question is : which alignment is actually computed ? (score of the best alignment) local or global ?

I cannot see the random model RX/RY states (and associated transition probabilities) present in the local alignment FSA of figure 4.3 in the implementation. I may simply have missed them, if so could you please point them out to me ?

I'd like to be sure, because the difference between the two FSAs is as important as the difference between running the Needleman–Wunsch and Smith-Waterman algorithm (global vs local).

Regards. Rick
Post edited by shlee on
Tagged:

@rwk
I am not positive about the answer so I am going to confer with the team and get back to you!

• Member
The algorithm is clear to me, from what I read of the code it is effectively the right of figure 4.1 (or 4.2 without the start and end states for simplicity) that is used and not 4.3

Therefore my concern, the comments in the source code clearly state that 4.3 is used and that it is local alignment, while the code in fact does global alignment.

It makes more sense to do global alignment (sequence to sequence, like Needleman-Wunsch) and this is what the codes seems to do (does).

#### Issue · Github December 2018 by shlee

Issue Number
5529
State
closed
Last Updated
Assignee
Array
Closed By
droazen

Hi @rwk, I have filed a request for clarification at https://github.com/broadinstitute/gatk/issues/5529, but I see you have also issued a pull request on the same at https://github.com/broadinstitute/gatk/pull/5528. I have asked the developers to follow up. Thank you for your contributions.

@rwk @shlee It seems to me like a very understandable confusion: it's local in the sense that reads are aligned to artificial haplotypes only within a small window defined by the assembly region, but within that small window it is, in fact, global alignment as you have both concluded.