We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Question about the alignment performed in the Haplotypecaller (pairHMM)

rwkrwk Member
edited December 2018 in Ask the GATK team

After looking at the implementation of the pair-HMM algorithm used in the haplotype caller I was left with a question.
From the source code comments, it seems that local alignment is supposed to be used.
link to the comment on github : (note : the links are missing the 'h' of https because I am not allowed to post links for now).
* https://github.com/broadinstitute/gatk/blob/67f0f0f2e59185b721398b17c24eba487a2ac76c/src/main/java/org/broadinstitute/hellbender/utils/pairhmm/PairHMM.java#L23
* https://github.com/broadinstitute/gatk/blob/67f0f0f2e59185b721398b17c24eba487a2ac76c/src/main/java/org/broadinstitute/hellbender/utils/pairhmm/Log10PairHMM.java#L11

This cites Figure 4.3 in Durbin 1998 book. This is the figure of the Finite State Automaton (FSA) for local alignment but when reading the implementation (of the forward algorithm) it seems that the FSA of figure 4.2 is used instead. (The model with M, X, and Y).
* https://github.com/broadinstitute/gatk/blob/67f0f0f2e59185b721398b17c24eba487a2ac76c/src/main/java/org/broadinstitute/hellbender/utils/pairhmm/PairHMMModel.java

The forward algorithm used seems to be the one presented in section 4.2 of the book which implements Figure 4.2 (and not 4.3). This algorithm is for global alignment and not local alignment.

My question is : which alignment is actually computed ? (score of the best alignment) local or global ?

I cannot see the random model RX/RY states (and associated transition probabilities) present in the local alignment FSA of figure 4.3 in the implementation. I may simply have missed them, if so could you please point them out to me ?

I'd like to be sure, because the difference between the two FSAs is as important as the difference between running the Needleman–Wunsch and Smith-Waterman algorithm (global vs local).

Thank you for your insights.
Regards. Rick
Post edited by shlee on

Best Answer


  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    I am not positive about the answer so I am going to confer with the team and get back to you!

  • rwkrwk Member
    Hello, thank you for your reply.
    The algorithm is clear to me, from what I read of the code it is effectively the right of figure 4.1 (or 4.2 without the start and end states for simplicity) that is used and not 4.3

    Therefore my concern, the comments in the source code clearly state that 4.3 is used and that it is local alignment, while the code in fact does global alignment.

    It makes more sense to do global alignment (sequence to sequence, like Needleman-Wunsch) and this is what the codes seems to do (does).

    Thank you for your answer.

    Issue · Github
    by shlee

    Issue Number
    Last Updated
    Closed By
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rwk, I have filed a request for clarification at https://github.com/broadinstitute/gatk/issues/5529, but I see you have also issued a pull request on the same at https://github.com/broadinstitute/gatk/pull/5528. I have asked the developers to follow up. Thank you for your contributions.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @rwk @shlee It seems to me like a very understandable confusion: it's local in the sense that reads are aligned to artificial haplotypes only within a small window defined by the assembly region, but within that small window it is, in fact, global alignment as you have both concluded.

    Also, we have docs: https://github.com/broadinstitute/gatk/blob/master/docs/pair_hmm.pdf.

Sign In or Register to comment.