We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Question about best haplotype finder in haplotypecaller

mehrzadmehrzad Member
edited April 2019 in Ask the GATK team
Hello everyone,

I was looking at the recent commits and I realized that the functionality of the finding method for best haplotypecaller is changed. I was wondering if it is intentional?

if you look at
hellbender/tools/walkers/haplotypecaller/graphs/KBestHaplotypeFinder.java:89

there is a line that limits the number of haplotypes that ends in a specific vertex

if (vertexCounts.get(targetVertex).getAndIncrement() < maxNumberOfHaplotypes)

I believe this means we just check the first maxNumberOfHaplotypes haplotypes for this vertex, not necessarily maxNumberOfHaplotypes highest score haplotypes. If you remove this if statement, this algorithm can find better score haplotypes. It is certainly faster but my question is if this is the functionality that you are expecting. I have sample BAMs that I can share.

Thank you for reading this.
Mehrzad

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @mehrzad

    GATK is an open source software. We welcome recommendations from the community. If you have certain suggestions that would make the tool better, please feel free to send a pull request and add comments and validations done by you.

  • mehrzadmehrzad Member

    Hi @bhanuGandham

    Thank you for your reply. So this is not the right place for this discussion. I need to go to github for this.

    Best,
    Mehrzad

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @mehrzad Dijkstra's algorithm is both greedy and optimal, so any path found through that vertex after the first maxNumberOfHaplotypes would necessarily have a worse score. Please refer to the pseudocode here: https://en.wikipedia.org/wiki/K_shortest_path_routing, in particular the line "if count_u ≤ K then".

  • mehrzadsmehrzads Ann ArborMember

    Hi David,

    Thank you David for your comment. Correct me if I'm wrong here. I believe in the algorithm when we talk about count_u, here u is the last vertex of the path we picked from heap. In the GATK code, it will be vertextoextend. But in the code we check the count for targetVertex . This way we will lose some good haplotypes.

    Best,
    Mehrzad

  • mehrzadsmehrzads Ann ArborMember
    edited May 2019

    for your reference,

    GATK code

    for (final BaseEdge edge : outgoingEdges) {
          final SeqVertex targetVertex = graph.getEdgeTarget(edge);
                if (vertexCounts.get(targetVertex).getAndIncrement() < maxNumberOfHaplotypes) {
                      queue.add(new KBestHaplotype(pathToExtend, edge, totalOutgoingMultiplicity));
                 }
     }
    

    Wiki code

    if countu ≤ K then
       for each vertex v adjacent to u:
            let Pv be a new path with cost C + w(u, v) formed by concatenating edge (u, v) to path Pu
            insert Pv into B
    

    You can see the different order of "for loop" and "if statement" in these two codes.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @mehrzads I see your point! Let me get back to you after thinking carefully about it.

Sign In or Register to comment.