We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Broad Mini Mutation Calling Workspace
This "Mini" Mutation Calling Tutorial includes a subset of tools in our complete Broad Mutation Calling Workflow. It contains ContEst, MuTect, and Oncotator tools. When run on "mini" tumor and cell line BAMs (containing only 100 genes), the expected runtime is roughly 30 minutes.
ContEst estimates contamination levels in next-generation sequencing data. It uses a Bayesian approach to calculate the posterior probability of the contamination level and determine the maximum a posteriori probability (MAP) estimate of the contamination level.
MuTect identifies somatic point mutations in next generation sequencing data of cancer genomes. It inputs sequencing data for matched normal and tumor tissue samples, and outputs mutation calls and optional coverage results.
In a nutshell, the analysis itself consists of three steps:
Pre-process the aligned reads in the tumor and normal sequencing data
Identify using statistical analysis, sites that are likely to carry somatic mutations with high confidence
Post-processing of candidate somatic mutations
For complete details, please see the 2013 publication in Nature Biotechnology.
Oncotator is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily used for human genome variant callsets. However, the tool can also be used to annotate any kind of information onto variant callsets from any organism.
Below is an overview of the individual tools within the Broad Mutation Calling Workflow.
What does ContEst do?
ContEst uses a Bayesian approach to calculate the posterior probability of the contamination level and determine the maximum a posteriori probability (MAP) estimate of the contamination level.
ContEst supports array-free mode, where we genotype on the fly from matched normals, and use that as our source of homozygous variant calls. It currently calls anything with > 80% of bases as the alternate with at least 50X coverage a homozygous alternate site.
What does MuTect do?
Pre-process the aligned reads in the tumor and normal sequencing data.
In this step MuTect ignores reads with too many mismatches or very low quality scores since these represent noisy reads that introduce more noise than signal.
Identify using statistical analysis sites that are likely to carry somatic mutations with high confidence.
The statistical analysis predicts a somatic mutation by using two Bayesian classifiers – the first aims to detect whether the tumor is non-reference at a given site and, for those sites that are found as non-reference, the second classifier makes sure the normal does not carry the variant allele. In practice the classification is performed by calculating a LOD score (log odds) and comparing it to a cutoff determined by the log ratio of prior probabilities of the considered events. For more information, refer to the MuTect Cancer Genome Analysis page.
Post-processing of candidate somatic mutations
This step aims to eliminate artifacts of next-generation sequencing, short read alignment and hybrid capture. For example, sequence context can cause hallucinated alternate alleles but often only in a single direction. Therefore, MuTect tests whether the alternate alleles supporting the mutations are observed in both directions.
What does Oncotator do?
Oncotator annotates information onto genomic point mutations (SNPs/SNVs) and indels.
By default, Oncotator uses a simple TSV file (e.g., MAFLITE) as an input and produces a TCGA MAF as an output. Oncotator also supports VCF files as an input and output format.
By extension, Oncotator can be configured to annotate genomic data with HTML reports. In this BasicSomaticMutationCalling workflow, Oncotator populates an HTML report to the Workspace Data tab.
Inputs and Outputs
Below are the tool-specific inputs and outputs for this workflow.
- ContEstTask.contaminationFile (Inputs into MuTect)
ContEstTask.contaminationFile (Output from ContEst)
MutectTask.MAFLiteFile (Input to Oncotator)
MutectTask.MAFLiteFile (Output from MuTect)
How to run this workflow in FireCloud
1. Clone the broad-firecloud-tutorials/MiniMutationCalling_V1_Tutorial workspace to run this workflow.
2. In your cloned workspace, navigate to the Method Configurations tab and click on the method, MiniMutationCalling.
3. Click Launch Analysis.
4. In the Launch Analysis window, toggle to pair and select a pair on which to run this workflow, e.g., HCC1143_pair_100_gene_250bp_pad. You can also run this workflow on a pair set by toggling to pair_set. Note: You must then type this.pairs in the Define Expression field.
5. Finally, click the Launch button. Check back on the Monitor tab after 30 minutes or so to view results from your workflow analysis.
6. When the status displays Done, click on the most recent analysis run to view outputs and results, e.g., HCC1143_pair_100_gene_250bp_pad (pair).
7. Click on Outputs: Show, then select output files to view the results of this analysis.
8. You can also view the Oncotator HTML report as an attribute in the Data tab.