Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Max Alt Alleles behavior

I'm using HaplotypeCaller to call a vcf file with 8 samples. I left the --max_Alt_Alleles argument at its default 6. However I'm getting some odd warning messages:

WARN 19:12:58,670 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at chr1:2274306 has 26 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument

Since I'm calling 8 samples, shouldn't the maximum number of alternate alleles be 15 (2n-1)? Or is the number of alternate alleles observed in the given context referring to the number of possible haplotypes that could possibly be constructed from the aligned reads?

Best Answer

Answers

  • ktretinaktretina Baltimore, MDMember

    As a follow-up to the question above, could you please explain how maxAltAlleles determines the "top" alleles? I'm interested in only calling the major allele using haploid organism data, and this looks like the correct option to do so.

    Issue · Github
    by Sheila

    Issue Number
    1792
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We remove alternate alleles that are the least well supported until we get to the number of genotypes/max alleles requested. The estimate of allele support is based on the ranking of the candidate haplotypes coming out of the graph building step. Note that the reference allele is always kept. Note that this approach is fairly naive and should only be used to pare down the number of combinations we're willing to consider to avoid explosive situations in messy regions, not to make an absolute determination of the correct allele (since it doesn't account for the quality of the sequence supporting each allele, just its quantity).

Sign In or Register to comment.