Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Rationale behind MuTect2 and Haplotype Caller
Dear GATK team,
I would like to better understand MuTect2 and HaplotypeCaller in order to present the methods, as I am doing a Thesis on Somatic Mutation Discovery. Unfortunately the only reference that I have is the original MuTect paper.
First question : Is there any reference that explain how HaplotypeCaller works in detail ? (So it could answer my questions)
I nevertheless have some questions about how the original MuTect is intricated with the HaplotypeCaller.
As far as I understood the HaplotypeCaller use 4 steps (and MuTect2 also I assume?) :
1. Define active regions
2. Determine haplotypes by assembly of the active region
3. Determine likelihoods of the haplotypes given the read data
4. Assign sample genotypes
And as far as I understood, to define the active regions, the original MuTect TLOD is used (and I don't know about the NLOD).
My questions are :
- Are likelihoods calculated with the PairHMM in 3) linked to the TLOD and NLOD ? Is it used to select variants ?
- At the end, what are the parameters that allow to give the "PASS" ? I know there is the TLOD and the NLOD thresholds (I know TLOD>6.3 from the orignal MuTect), but how the steps 2) 3) and 4) are affecting the labelling of a variant as "PASS" ?
Is there any paper on that method that will be released soon ?
Thank you very much in advance ! Have a nice day. Kind regards,