If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
What input files does MuTect accept / require?
Please note that this article refers to the original standalone version of MuTect. A new version is now available within GATK (starting at GATK 3.5) under the name MuTect2. This new version is able to call both SNPs and indels. See the GATK version 3.5 release notes and the MuTect2 tool documentation for further details.
All analyses done with MuTect typically involve several (though not necessarily all) of the following inputs:
- Reference genome sequence
- Sequencing reads for normal tissue and tumor tissue (normal/tumor data)
- Intervals of interest
- COSMIC data
- Panel of normals
Since MuTect is based on GATK, the general format requirements are the same as those described in the GATK documentation on input files.
Below are the input requirements and/or recommendations that are specific to MuTect.
1. Normal/Tumor data
A key component of the MuTect method involves comparing evidence for variation in a tumor sample against a matched normal sample from the same individual, in order to distinguish somatic mutations from germline mutations. So the Best Practice recommendation is to provide both normal and tumor data from the same individual to MuTect for best results. However, it is possible to run MuTect only on tumor samples without a matched normal. If available, a Panel of Normals (PoN) can be used to represent expected germline variation.
2. COSMIC data
COSMIC stands for Catalog Of Somatic Mutations In Cancer. It is a database of variants that have been found to be implicated in cancer processes, maintained by the Sanger Institute (see project website).
MuTect uses the COSMIC data to whitelist variants that are found in tumor samples, to prevent them from being filtered out if they are also present in dbSNP or a panel of normals.