Normallod and tumorlod in Mutect2
lukeheo
koreaMember ✭
Hello,
What are the normallod and tumorlod in Mutect2?
What are the bases of the default values 2.2 and 3.0 respectively?
And, if the threshold are lowered or raised, what would be happened in terms of mutation calling?
Please, explain them by using very basic terms ( AD, AF, ...)
Many thanks,
Luke
Best Answers

davidben Boston ✭✭
Whoops, I dropped the ball. The lods are log10 likelihood ratios i.e. a normal lod of 4 means the reads support a homref hypothesis for the normal by a factor of 10^4. The default thresholds are just an empirical compromise between speed, sensitivty, and precision. We are working on a better model that lets the user choose a maximum acceptable false discovery rate.
Answers
@lukeheo
Hi Luke,
I am checking with the developer and will get back to you.
Sheila
Whoops, I dropped the ball. The lods are log10 likelihood ratios i.e. a normal lod of 4 means the reads support a homref hypothesis for the normal by a factor of 10^4. The default thresholds are just an empirical compromise between speed, sensitivty, and precision. We are working on a better model that lets the user choose a maximum acceptable false discovery rate.
Thanks ,can you explain what is homref hypothesis for the normal, and is 10000 to compare with L(H1)/L(H2), to decide whether to accept the homref hypothesis, thansks a lot
The likelihood of a hypothesis is defined as the probability of the observed data (reads) given that the hypothesis is true. It's different from the probability because the hypothesis fit the data well but be a priori not probable. For example, if it's dark outside the hypothesis that sunblocking aliens came to Earth is quite high (assuming that these aliens are always in the habit of blocking the sun) because it explains the data well, despite being outlandish.
Anyway, the NLOD is the log 10 of the following likelihood ratio (Here "P" means probability and "" means "given that"):
P(reads  normal is hom ref ie has no mutation) / P(reads  normal is het ie has the mutation)
You could approximately think of these likelihoods as coming from a binomial model, where if we have k alt reads out of n total reads and have a base error rate of e, then
P(reads  hom ref) = Binomial(k  n, e) (alt reads are due to error)
and
P(reads  het) = Binomial(k  n, 1/2) (alt reads are real and diploid)
The results of Mutect2 are fairly insensitive to the threshold because given a modest depth of coverage in the normal the NLOD is usually overwhelming one way or the other, since diploid het calling is generally easy.
I think we will be much appreciated if there a detailed document to explain the threshold for gatk4 Mutect2's vcf FILTER column. For example, when the variant was flagged as "cluster_event", I searched the forum by "cluster_event", such as https://gatkforums.broadinstitute.org/gatk/discussion/9985/gatk4betaclusteredeventsinmutect2filtermutectcalls and I also have read the mathematical notes no Mutect PDF, but still I could not get a confirmed answer when it is flagged.
@xiucz The answer to this and many other questions can be found in the docs we maintain in the GATK repo on github: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf. In particular, you may find Section 8: Mutect Filters and the table therein to be helpful.
your explanation is wonderful, but I do not understand.
The likelihood of a hypothesis is defined as the probability of the observed data (reads) given that the hypothesis is true. this sentence maybe not well say what the hypothesis is ?
For example, in
P(reads  normal is hom ref)
, the reads are the data and "normal is hom ref" is the hypothesis.thanks a lot. it is a P(AB) like model maybe, due to my poor statistic knowledge, you do not need to explain more to me , thanks