The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

HaplotypeCaller

I understand the HaplotypeCaller does some local assembly and realignment. Can someone expand on the parameters used during the local assembly? What is the kmer used for the assembly graph? I would like to explore the use of digital normalization prior to SNP calling to remove PCR artifacts and this information would be helpful.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,736 admin

    Hi Severin,

    The HC uses a range of kmers for the assembly graph, not a single one. Unfortunately we don't currently have much documentation ready on the HC; right now we just have the Technical Document, and there is a presentation that explains how the HC works here.

    If you have any detailed questions we can try to answer those but you'll need to be more specific.

    Geraldine Van der Auwera, PhD

  • severinseverin Posts: 5

    No problem. Can you answer the following?
    1) Are you using an open source assembler to do the assembly or an in house assembler?

    2) What kmer sizes do you use and how do you merge the result?
    3) If the kmer sizes are not the same or are dependent on read lengths etc what is the algorithm for determining the kmer sizes used in the assembly?

    Thank you so much for your quick responses. As I mentioned before I would like to try to normalize my reads prior to a SNP calling using the Haplotypecaller and believe that this normalization step will make the assembly of the HaplotypeCaller easier. However, I would like to provide the normalization program with a kmer size that would best match your assembler within the HaplotypeCaller.

    Thanks again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,736 admin

    Ok, I've forwarded your questions to Ryan, who developed the HaplotypeCaller. If he can't answer you no one can ;)

    Geraldine Van der Auwera, PhD

  • rpoplinrpoplin Posts: 122 ✭✭✭
    edited December 2012

    Hi there,

    1.) We built our own de Bruijn graph based assembler.

    2.) and 3.) We use every kmer from the minimum value to the maximum read length in the region. There is no merging of results. Separate assembly graphs are built for each kmer and haplotypes are read off as the paths in each graph. The minimum kmer in your version is hard coded to k=31 but that will soon become a command line argument.

    I hope that helps!

Sign In or Register to comment.