Reversions: an Algorithm issue

Since this is an algorithm question that covers both types of MuTect, I'd rather raise it here.

I have noticed the variant calling model in MuTect seems to require AF[TUMOR] > AF[NORMAL] to be called. This implies that a reversion/ back mutation, i.e. AF[TUMOR] < AF[NORMAL], will not be called. Is there any rationale to this?

Issue · Github
by Sheila

Issue Number
1298
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited September 2016

    @johnma,

    I believe these are still called but may not pass filters, e.g. FILTER field will have alt_allele_in_normal. Is this what you are asking about--why these do not pass filters? I recommend reviewing one of our extended MuTect2 workshop presentations, e.g. that from this March for a list of the filters.

    If this is not what you are asking, then can you give us some example calls that illustrate your concern.

  • johnmajohnma Member

    My issue can be exemplified from the CGA's website, where the log-odds for NORMAL at one site is defined as this.

    But NORMAL need not to be HomRef at any site. For example, suppose in a completely homozygous situation, ref=A, NORMAL=G, and TUMOR=A. Because TUMOR=ref, tlod would be low enough to cause a t_lod_fstar flag.

    Similarly, if ref=A, NORMAL=G, and TUMOR=T, at best this would be tagged as germline_risk (as nlod as refined to be likelyhood for NORMAL to be HomRef would be low), if not just uncalled because it's a triallelic site.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @johnma,

    I'm having a little trouble understanding the question you are asking, so please let me know if the following answers your question.

    MuTect2 does not call triallelic sites nor call such reversion-to-ref sites that you describe. The rationale for this relates to optimizations based on sequencing tech from when MuTect1 came about. I will conjecture that these optimizations were based on empirical observations and the highly optimized default settings we see today reflect the cost-benefit analyses of those days. When you observed a "variant" in the normal that is not in the tumor, (i) it was more likely to be some artifact of sample-prep/sequencing/alignment/tool-chain AND (ii) (and this is my conjecture) downstream analyses did not suffer its loss even if it were real.

    Let's discuss this latter conjecture by looking at the typical workflow of those using MuTect. For one, results of MuTect are typically fed into tools that analyzed significance over large cohorts. The size of the cohort provides the power to detect drivers of tumorigenesis. However, I get the impression there is a lot of manual review of sites. I imagine it's desirable for such sites to be tagged so that they could be manually reviewed to determine if its event(s) are real (then change to PASS) or indeed artifactual (then keep as filtered). And here is where I'm a bit surprised. The thing about MuTect1 versus MuTect2 (that I observe with my preliminary testing; someone correct me if I'm wrong) is that MuTect1 emits such sites and filters them while MuTect2 appears to omit these sites altogether from the callset.

    If you find that sequencing tech has changed enough over the years to change the cost-benefits of filtering/emitting such sites and/or that this setting in the tool is prohibitive to using the tool in your research, then I highly recommend you request this as a new feature for MuTect2. It is, after all, in beta. Specifically, I'm told what you can request is that the normalLod cutoff and the nonRefInNormal cutoff be parameterized (in line 543 of the MuTect2 code). @Sheila can help you if you decide to request this as a feature.

Sign In or Register to comment.