Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MuTect2 and Alternate Allele Calls

Hi there,

We have a few calls in our dataset that are slightly confusing. We are getting a triallelic flag, which is expected, but only getting one alternate allele outputted to the VCF. The alternate allele that is being called is at <1% VAF, whereas the other alternate is in the 35-40% range, and is a common somatic variant. Any ideas on how to force the calling of these more frequent alternate alleles?

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @pachewychomp
    Hi,

    I am not sure I understand the issue. Can you post the VCF record for the site in question?

    Thanks,
    Sheila

  • pachewychomppachewychomp OregonMember
    edited October 2016

    Hi Shelia,

    Yes, sorry, I may not have explained this very well. Here is the record we are questioning:

    12 25398284 rs121913529 C A . clustered_events;homologous_mapping_event;triallelic_site DB;ECNT=3;HCNT=1;MAX_ED=16;MIN_ED=5;NLOD=0.00;TLOD=7.66 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:1998,17:2.057e-03:12:5:0.294:67889,540:963:984

    I have spent quite a bit of time running through different parameter sets, but have not been able to ensure these calls are properly made. As an additional note, I do run this analysis without a matched normal, but with a panel of normals. This is also run with both cosmic and dbsnp resource files. Cosmic lists 7 entries at this position, while dbsnp lists one multiallelic position. I have performed variant calling without any of these files and still ended up with the same result.

    Oh, and FWIW, the change we see by viewing the BAM file is a C>G substitution.

    Thanks!

    Post edited by pachewychomp on
  • SheilaSheila Broad InstituteMember, Broadie admin

    @pachewychomp
    Hi,

    Oh, I see. So, you are asking why the allele that is called is an A when the BAM shows a G. MuTect2 does a local reassembly step that may shift the reads. You can have a look at the bamout file which shows the reassembled reads. Note the document talks about HaplotypeCaller, but you can generate the bamout with MuTect2 as well.

    -Sheila

  • pachewychomppachewychomp OregonMember

    Hi Sheila,

    Yes, I understand this is what MuTect2 does, but I am trying to figure out how to change this behavior. This is a known call that we are trying to capture, so the local reassembly is having trouble at this particular location for some reason. Also, if there is a 'triallelic' tag annotated, wouldn't you expect there to be multiple alternate alleles listed? I can't imagine a situation where you would make a triallelic variant call and only list a <1% variant while filtering the other?

    Thanks,
    John

  • SheilaSheila Broad InstituteMember, Broadie admin

    @pachewychomp
    Hi John,

    Got it. Yes, we have had some reports of this in other posts. The team is working on ways to deal with messy sites that have multiple possible alleles present. Can you post a picture of the bamout file? We have a team dedicated to improving MuTect2 now, so I suspect a fix will be in for this issue within the next few months.

    -Sheila

  • pachewychomppachewychomp OregonMember

    Here's one such example, from the VCF:

    10 123310871 . A T . t_lod_fstar;triallelic_site ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=0.00;TLOD=4.16 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:0,2:1.00:0:2:1.00:0,72:0:0

    As you can see, the call looks pretty poor from this perspective. In the attached bamout shot, you can see that there the call is straightforward, even the reassembler think so. ;-)

    The depth at this locus is pretty high, ~1500 or so.

    Thanks!

  • pachewychomppachewychomp OregonMember

    Another comment on the above post. You can generally deal with these sites by imposing a TLOD filter. This does result in the correct call being pulled out. The only problem is arbitrarily deciding on the appropriate TLOD score.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @pachewychomp
    Hi,

    I know the team is working on this issue. Can you share some of your data with us, so we can use it as a test case? If so, instructions are here.

    Thanks,
    Sheila

    P.S. What kind of TLOD allows the call to be made? Please post the VCF record as well.

  • pachewychomppachewychomp OregonMember

    For completion, the VCF record is posted on the 12/2 comment. I adjust the initial_tumor_lod to 7.0 and the tumor_lod to 10.0. Great to hear this is being addressed!

    Thanks much,
    John

Sign In or Register to comment.