HaplotypeCaller

severinseverin Posts: 5Member

I understand the HaplotypeCaller does some local assembly and realignment. Can someone expand on the parameters used during the local assembly? What is the kmer used for the assembly graph? I would like to explore the use of digital normalization prior to SNP calling to remove PCR artifacts and this information would be helpful.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,624Administrator, GATK Developer admin

    Hi Severin,

    The HC uses a range of kmers for the assembly graph, not a single one. Unfortunately we don't currently have much documentation ready on the HC; right now we just have the Technical Document, and there is a presentation that explains how the HC works here.

    If you have any detailed questions we can try to answer those but you'll need to be more specific.

    Geraldine Van der Auwera, PhD

  • severinseverin Posts: 5Member

    No problem. Can you answer the following? 1) Are you using an open source assembler to do the assembly or an in house assembler?

    2) What kmer sizes do you use and how do you merge the result? 3) If the kmer sizes are not the same or are dependent on read lengths etc what is the algorithm for determining the kmer sizes used in the assembly?

    Thank you so much for your quick responses. As I mentioned before I would like to try to normalize my reads prior to a SNP calling using the Haplotypecaller and believe that this normalization step will make the assembly of the HaplotypeCaller easier. However, I would like to provide the normalization program with a kmer size that would best match your assembler within the HaplotypeCaller.

    Thanks again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,624Administrator, GATK Developer admin

    Ok, I've forwarded your questions to Ryan, who developed the HaplotypeCaller. If he can't answer you no one can ;)

    Geraldine Van der Auwera, PhD

  • rpoplinrpoplin Posts: 122GATK Developer mod
    edited December 2012

    Hi there,

    1.) We built our own de Bruijn graph based assembler.

    2.) and 3.) We use every kmer from the minimum value to the maximum read length in the region. There is no merging of results. Separate assembly graphs are built for each kmer and haplotypes are read off as the paths in each graph. The minimum kmer in your version is hard coded to k=31 but that will soon become a command line argument.

    I hope that helps!

    Post edited by rpoplin on
Sign In or Register to comment.