Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Question about best haplotype finder in haplotypecaller

mehrzadmehrzad Member
edited April 9 in Ask the GATK team
Hello everyone,

I was looking at the recent commits and I realized that the functionality of the finding method for best haplotypecaller is changed. I was wondering if it is intentional?

if you look at
hellbender/tools/walkers/haplotypecaller/graphs/KBestHaplotypeFinder.java:89

there is a line that limits the number of haplotypes that ends in a specific vertex

if (vertexCounts.get(targetVertex).getAndIncrement() < maxNumberOfHaplotypes)

I believe this means we just check the first maxNumberOfHaplotypes haplotypes for this vertex, not necessarily maxNumberOfHaplotypes highest score haplotypes. If you remove this if statement, this algorithm can find better score haplotypes. It is certainly faster but my question is if this is the functionality that you are expecting. I have sample BAMs that I can share.

Thank you for reading this.
Mehrzad

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @mehrzad

    GATK is an open source software. We welcome recommendations from the community. If you have certain suggestions that would make the tool better, please feel free to send a pull request and add comments and validations done by you.

  • mehrzadmehrzad Member

    Hi @bhanuGandham

    Thank you for your reply. So this is not the right place for this discussion. I need to go to github for this.

    Best,
    Mehrzad

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @mehrzad Dijkstra's algorithm is both greedy and optimal, so any path found through that vertex after the first maxNumberOfHaplotypes would necessarily have a worse score. Please refer to the pseudocode here: https://en.wikipedia.org/wiki/K_shortest_path_routing, in particular the line "if count_u ≤ K then".

  • mehrzadsmehrzads Ann ArborMember

    Hi David,

    Thank you David for your comment. Correct me if I'm wrong here. I believe in the algorithm when we talk about count_u, here u is the last vertex of the path we picked from heap. In the GATK code, it will be vertextoextend. But in the code we check the count for targetVertex . This way we will lose some good haplotypes.

    Best,
    Mehrzad

  • mehrzadsmehrzads Ann ArborMember
    edited May 3

    for your reference,

    GATK code

    for (final BaseEdge edge : outgoingEdges) {
          final SeqVertex targetVertex = graph.getEdgeTarget(edge);
                if (vertexCounts.get(targetVertex).getAndIncrement() < maxNumberOfHaplotypes) {
                      queue.add(new KBestHaplotype(pathToExtend, edge, totalOutgoingMultiplicity));
                 }
     }
    

    Wiki code

    if countu ≤ K then
       for each vertex v adjacent to u:
            let Pv be a new path with cost C + w(u, v) formed by concatenating edge (u, v) to path Pu
            insert Pv into B
    

    You can see the different order of "for loop" and "if statement" in these two codes.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @mehrzads I see your point! Let me get back to you after thinking carefully about it.

Sign In or Register to comment.