If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

Question about the alignment performed in the Haplotypecaller (pairHMM)

rwkrwk Member
edited December 2018 in Ask the GATK team

After looking at the implementation of the pair-HMM algorithm used in the haplotype caller I was left with a question.
From the source code comments, it seems that local alignment is supposed to be used.
link to the comment on github : (note : the links are missing the 'h' of https because I am not allowed to post links for now).

This cites Figure 4.3 in Durbin 1998 book. This is the figure of the Finite State Automaton (FSA) for local alignment but when reading the implementation (of the forward algorithm) it seems that the FSA of figure 4.2 is used instead. (The model with M, X, and Y).

The forward algorithm used seems to be the one presented in section 4.2 of the book which implements Figure 4.2 (and not 4.3). This algorithm is for global alignment and not local alignment.

My question is : which alignment is actually computed ? (score of the best alignment) local or global ?

I cannot see the random model RX/RY states (and associated transition probabilities) present in the local alignment FSA of figure 4.3 in the implementation. I may simply have missed them, if so could you please point them out to me ?

I'd like to be sure, because the difference between the two FSAs is as important as the difference between running the Needleman–Wunsch and Smith-Waterman algorithm (global vs local).

Thank you for your insights.
Regards. Rick
Post edited by shlee on

Best Answer


  • SChaluvadiSChaluvadi admin Member, Broadie, Moderator admin

    I am not positive about the answer so I am going to confer with the team and get back to you!

  • rwkrwk Member
    Hello, thank you for your reply.
    The algorithm is clear to me, from what I read of the code it is effectively the right of figure 4.1 (or 4.2 without the start and end states for simplicity) that is used and not 4.3

    Therefore my concern, the comments in the source code clearly state that 4.3 is used and that it is local alignment, while the code in fact does global alignment.

    It makes more sense to do global alignment (sequence to sequence, like Needleman-Wunsch) and this is what the codes seems to do (does).

    Thank you for your answer.

    Issue · Github
    by shlee

    Issue Number
    Last Updated
    Closed By
  • shleeshlee ✭✭✭✭✭ CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rwk, I have filed a request for clarification at, but I see you have also issued a pull request on the same at I have asked the developers to follow up. Thank you for your contributions.

  • davidbendavidben ✭✭✭ BostonMember, Broadie, Dev ✭✭✭

    @rwk @shlee It seems to me like a very understandable confusion: it's local in the sense that reads are aligned to artificial haplotypes only within a small window defined by the assembly region, but within that small window it is, in fact, global alignment as you have both concluded.

    Also, we have docs:

Sign In or Register to comment.