We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!


I understand the HaplotypeCaller does some local assembly and realignment. Can someone expand on the parameters used during the local assembly? What is the kmer used for the assembly graph? I would like to explore the use of digital normalization prior to SNP calling to remove PCR artifacts and this information would be helpful.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Severin,

    The HC uses a range of kmers for the assembly graph, not a single one. Unfortunately we don't currently have much documentation ready on the HC; right now we just have the Technical Document, and there is a presentation that explains how the HC works here.

    If you have any detailed questions we can try to answer those but you'll need to be more specific.

  • severinseverin Member

    No problem. Can you answer the following?
    1) Are you using an open source assembler to do the assembly or an in house assembler?

    2) What kmer sizes do you use and how do you merge the result?
    3) If the kmer sizes are not the same or are dependent on read lengths etc what is the algorithm for determining the kmer sizes used in the assembly?

    Thank you so much for your quick responses. As I mentioned before I would like to try to normalize my reads prior to a SNP calling using the Haplotypecaller and believe that this normalization step will make the assembly of the HaplotypeCaller easier. However, I would like to provide the normalization program with a kmer size that would best match your assembler within the HaplotypeCaller.

    Thanks again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ok, I've forwarded your questions to Ryan, who developed the HaplotypeCaller. If he can't answer you no one can ;)

  • rpoplinrpoplin Member ✭✭✭
    edited December 2012

    Hi there,

    1.) We built our own de Bruijn graph based assembler.

    2.) and 3.) We use every kmer from the minimum value to the maximum read length in the region. There is no merging of results. Separate assembly graphs are built for each kmer and haplotypes are read off as the paths in each graph. The minimum kmer in your version is hard coded to k=31 but that will soon become a command line argument.

    I hope that helps!

Sign In or Register to comment.