Normal-lod and tumor-lod in Mutect2

lukeheolukeheo koreaMember

Hello,

What are the normal-lod and tumor-lod in Mutect2?
What are the bases of the default values 2.2 and 3.0 respectively?
And, if the threshold are lowered or raised, what would be happened in terms of mutation calling?
Please, explain them by using very basic terms ( AD, AF, ...)

Many thanks,
Luke

Issue · Github
by Sheila

Issue Number
3101
State
open
Last Updated

Best Answers

  • SheilaSheila Broad Institute admin
    Accepted Answer

    @lukeheo
    Hi Luke,

    I am checking with the developer and will get back to you.

    -Sheila

  • davidbendavidben Boston ✭✭
    Accepted Answer

    Whoops, I dropped the ball. The lods are log-10 likelihood ratios i.e. a normal lod of 4 means the reads support a hom-ref hypothesis for the normal by a factor of 10^4. The default thresholds are just an empirical compromise between speed, sensitivty, and precision. We are working on a better model that lets the user choose a maximum acceptable false discovery rate.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @lukeheo
    Hi Luke,

    I am checking with the developer and will get back to you.

    -Sheila

  • davidbendavidben BostonMember, Broadie, Dev ✭✭
    Accepted Answer

    Whoops, I dropped the ball. The lods are log-10 likelihood ratios i.e. a normal lod of 4 means the reads support a hom-ref hypothesis for the normal by a factor of 10^4. The default thresholds are just an empirical compromise between speed, sensitivty, and precision. We are working on a better model that lets the user choose a maximum acceptable false discovery rate.

  • manbamanba Member

    Thanks ,can you explain what is hom-ref hypothesis for the normal, and is 10000 to compare with L(H1)/L(H2), to decide whether to accept the hom-ref hypothesis, thansks a lot

  • davidbendavidben BostonMember, Broadie, Dev ✭✭

    The likelihood of a hypothesis is defined as the probability of the observed data (reads) given that the hypothesis is true. It's different from the probability because the hypothesis fit the data well but be a priori not probable. For example, if it's dark outside the hypothesis that sun-blocking aliens came to Earth is quite high (assuming that these aliens are always in the habit of blocking the sun) because it explains the data well, despite being outlandish.

    Anyway, the NLOD is the log 10 of the following likelihood ratio (Here "P" means probability and "|" means "given that"):

    P(reads | normal is hom ref ie has no mutation) / P(reads | normal is het ie has the mutation)

    You could approximately think of these likelihoods as coming from a binomial model, where if we have k alt reads out of n total reads and have a base error rate of e, then

    P(reads | hom ref) = Binomial(k | n, e) (alt reads are due to error)
    and
    P(reads | het) = Binomial(k | n, 1/2) (alt reads are real and diploid)

    The results of Mutect2 are fairly insensitive to the threshold because given a modest depth of coverage in the normal the NLOD is usually overwhelming one way or the other, since diploid het calling is generally easy.

  • xiuczxiucz Member

    I think we will be much appreciated if there a detailed document to explain the threshold for gatk4 Mutect2's vcf FILTER column. For example, when the variant was flagged as "cluster_event", I searched the forum by "cluster_event", such as https://gatkforums.broadinstitute.org/gatk/discussion/9985/gatk-4-beta-clustered-events-in-mutect2-filtermutectcalls and I also have read the mathematical notes no Mutect PDF, but still I could not get a confirmed answer when it is flagged.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭

    @xiucz The answer to this and many other questions can be found in the docs we maintain in the GATK repo on github: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf. In particular, you may find Section 8: Mutect Filters and the table therein to be helpful.

  • manbamanba Member
    edited December 7

    @davidben said:
    The likelihood of a hypothesis is defined as the probability of the observed data (reads) given that the hypothesis is true. It's different from the probability because the hypothesis fit the data well but be a priori not probable. For example, if it's dark outside the hypothesis that sun-blocking aliens came to Earth is quite high (assuming that these aliens are always in the habit of blocking the sun) because it explains the data well, despite being outlandish.

    Anyway, the NLOD is the log 10 of the following likelihood ratio (Here "P" means probability and "|" means "given that"):

    P(reads | normal is hom ref ie has no mutation) / P(reads | normal is het ie has the mutation)

    You could approximately think of these likelihoods as coming from a binomial model, where if we have k alt reads out of n total reads and have a base error rate of e, then

    P(reads | hom ref) = Binomial(k | n, e) (alt reads are due to error)
    and
    P(reads | het) = Binomial(k | n, 1/2) (alt reads are real and diploid)

    The results of Mutect2 are fairly insensitive to the threshold because given a modest depth of coverage in the normal the NLOD is usually overwhelming one way or the other, since diploid het calling is generally easy.

    your explanation is wonderful, but I do not understand. :o :o

    The likelihood of a hypothesis is defined as the probability of the observed data (reads) given that the hypothesis is true. this sentence maybe not well say what the hypothesis is ?

  • davidbendavidben BostonMember, Broadie, Dev ✭✭

    For example, in P(reads | normal is hom ref), the reads are the data and "normal is hom ref" is the hypothesis.

  • manbamanba Member

    thanks a lot. it is a P(A|B) like model maybe, due to my poor statistic knowledge, you do not need to explain more to me , thanks

Sign In or Register to comment.