This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
A simple explanation of MuTect2 (GATK3) on how it works
Hello GATK team,
As you all know, there are many blogs/docs explaining how MuTect2 works but with lots of technical and statistical details. People who don't specialize in these domains can't easily understand how MuTect2 works. For this reason, I would like to have a discussion on how MuTect2 works with a simple example.
Let's say that we have the following information:
Reference genome sequence in a given region:
The normal sample in the same region having the following reads:
...ATCGTCAGATCATTTACGCCAGTCACTGACTGCACG... (x80 times reads)
...ATCGTCAGAACATTTACGCCAGTCACTGACTGCACG... (x10 times reads)
And the tumor sample in the same region:
...ATCGTCAGAACATTTACGCCAGTCACTGACTGCACG... (x75 times reads)
...ATCGTCAGATCATTTACGCCAGTCACTGACTGCACG... (x35 times reads)
How does MuTect2 handles such situation ?
Could we go over each step by explaining simply what does MuTect2 does ?
I gave this example by randomly typing the sequence with a single variant. If there are other better situations to take into account that can explain all the decisions that MuTect2 does when comparing reads, I would be happy to hear them.
Let's not forget that there are also the filtering options (dbSNP membership or 1k mills genome) or the hard filters to take into account:
I got another situation in mind. Let's say for example that the same variant is found to be similar in the normal vs tumor sample but different to the reference genome. What happens in this case ?
Thanks in advance.