**We've moved!**

This site is now read-only. You can find our new documentation site and support forum for posting questions here.

Be sure to read our welcome blog!

# Question about best haplotype finder in haplotypecaller

mehrzad
Member ✭

Hello everyone,

I was looking at the recent commits and I realized that the functionality of the finding method for best haplotypecaller is changed. I was wondering if it is intentional?

if you look at

hellbender/tools/walkers/haplotypecaller/graphs/KBestHaplotypeFinder.java:89

there is a line that limits the number of haplotypes that ends in a specific vertex

if (vertexCounts.get(targetVertex).getAndIncrement() < maxNumberOfHaplotypes)

I believe this means we just check the first maxNumberOfHaplotypes haplotypes for this vertex, not necessarily maxNumberOfHaplotypes highest score haplotypes. If you remove this if statement, this algorithm can find better score haplotypes. It is certainly faster but my question is if this is the functionality that you are expecting. I have sample BAMs that I can share.

Thank you for reading this.

Mehrzad

I was looking at the recent commits and I realized that the functionality of the finding method for best haplotypecaller is changed. I was wondering if it is intentional?

if you look at

hellbender/tools/walkers/haplotypecaller/graphs/KBestHaplotypeFinder.java:89

there is a line that limits the number of haplotypes that ends in a specific vertex

if (vertexCounts.get(targetVertex).getAndIncrement() < maxNumberOfHaplotypes)

I believe this means we just check the first maxNumberOfHaplotypes haplotypes for this vertex, not necessarily maxNumberOfHaplotypes highest score haplotypes. If you remove this if statement, this algorithm can find better score haplotypes. It is certainly faster but my question is if this is the functionality that you are expecting. I have sample BAMs that I can share.

Thank you for reading this.

Mehrzad

Tagged:

## Answers

Hi @mehrzad

GATK is an open source software. We welcome recommendations from the community. If you have certain suggestions that would make the tool better, please feel free to send a pull request and add comments and validations done by you.

Hi @bhanuGandham

Thank you for your reply. So this is not the right place for this discussion. I need to go to github for this.

Best,

Mehrzad

@mehrzad Dijkstra's algorithm is both greedy and optimal, so any path found through that vertex after the first

`maxNumberOfHaplotypes`

would necessarily have a worse score. Please refer to the pseudocode here: https://en.wikipedia.org/wiki/K_shortest_path_routing, in particular the line "if count_u ≤ K then".Hi David,

Thank you David for your comment. Correct me if I'm wrong here. I believe in the algorithm when we talk about

count_u, hereuis the last vertex of the path we picked from heap. In the GATK code, it will bevertextoextend. But in the code we check the count fortargetVertex. This way we will lose some good haplotypes.Best,

Mehrzad

for your reference,

GATK code

Wiki code

You can see the different order of "for loop" and "if statement" in these two codes.

@mehrzads I see your point! Let me get back to you after thinking carefully about it.