**Notice:**

If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

#### Test-drive the GATK tools and Best Practices pipelines on Terra

**Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.**

# what's the difference between mutect and GATK-Mutect ?

YingLiu
ChinaMember ✭

HI,

I do not know the difference between the mutect from http://archive.broadinstitute.org/cancer/cga/mutect and from GATK ?

I find the latest version is 1.14 at the CGA (http://archive.broadinstitute.org/cancer/cga/mutect_download) ,however the version is 2 at GATK.

Is Same ?

thank you!

## Answers

@Geraldine , any difference about snv calling method ? is same ?

@YingLiu, one is basically a pileup caller that allows for low fraction alleles and the other uses HaplotypeCaller's graph assembly approach. There is a publication about M1 (Cibulskis et al 2013). M2 has a whitepaper discussing its algorithms at https://github.com/broadinstitute/gatk/tree/master/docs/mutect. Again, in the

`gatk`

repo, go to`gatk/docs/mutect/mutect.pdf`

.@shlee thank you !

I have read paper about M1 .

but I can not understand what's the meaning of "P(bi|ei,r,m,f)" , I guess it means that the probability of the called base is real???

the detailed rule as the following link :

Expecting your answer.

## Issue · Github
by shlee

@YingLiu, I have to consult with one of our developers/mathematicians.

Since your link is broken, I've included the equation you refer to again below @YinglLiu.

Our developer explains:This equation is the likelihood of observing base b given error probability e, reference base r, alt base m and alt allele fraction f. Since e, r, and m are known, this comes down to the probability of the data (b) given the allele fraction. Mutect 1's LOD score is the ratio of 1) this quantity when we plug in the observed alt allele fraction i.e. assume a somatic variant to 2) this quantity when we plug in f = 0 i.e. assume hom ref. In a fully Bayesian framework (which M1 is not) we would use Bayes' rule and a prior to "invert" this likelihood and get the probability of f given the observed data.

We can parse the three parts of this equation as follows. Note that f is the fraction of DNA that is alt, 1 - f is the fraction of DNA that is ref, e is the probability of error, 1 - e is the probability of no error, and (because there are three possible substitution errors) e/3 is the probability of a ref to alt or alt to ref error

Note that GATK 4 M2 does something much more sophisticated.

We apologize the docs for M2 are incomplete. Since GATK4 and many of its tools are in BETA, documentation is also in BETA, and their completion is pending finalization of the tools themselves.

@shlee , thank you very much ! please forward my many thanks to the developer with providing help for us .

why does the following rule use the quantity(the probablity that the present read base is observed for every read ) continued multiply ?

I guess its goal is to calculate genotype(ref|alt) proability at this site ,if so ,why use multiplication ?

any detailed explanations will be welcome .

thank you !

@shlee, any answers is welcome ?

Hi @YingLiu,

Our developer says:The original equation we discussed was the likelihood of a single base in the read pileup given the alt allele fraction f. This equation is the likelihood of the set of

allobserved bases (that is, all reads in the pileup) given f. Since each read is an independent measurement, the probabilities multiply.This assumes that errors are independent and that the base qualities can be trusted. If they were not independent it would be very hard to make a tractable statistical model. Since, however, there

aresystematic sequencing artifacts (i.e. errors are not independent because some events in library prep and sequencing may create multiple reads with errors) Mutect needs an additional filtering step that detects systematic artifacts.We didn't forget you. Sometimes it takes a while for us to respond because we have other higher priority work or folks are on vacation, etc. I hope this information is helpful.

@shlee,

thank you very much ! "Mutect needs an additional filtering step that detects systematic artifacts" ,which addtional filtering step I should use ??

@YingLiu, please see FilterMutectCalls and FilterByOrientationBias documentation.