Oncotator overview and basic usage
Oncotator is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily intended to be used on human genome variant callsets and we only provide data sources that are relevant to cancer researchers. However, the tool can technically be used to annotate any kind of information onto variant callsets from any organism, and we provide instructions on how to prepare custom data sources for inclusion in the process.
By default Oncotator is set up to use a simple tsv (a.k.a MAFLITE) as input and produces a TCGA MAF as output. See details below.
Oncotator also supports VCF as an input and/or output format.
The input tsv (MAFLITE) file must have the following columns (with column headers):
- build (at this time the build must be hg19 for all variants)
- ref_allele (should be "-" for an insertion)
- alt_allele (should be "-" for a deletion)
An example input file is provided with the program files. For SNPs, see test/testdata/maflite/Patient0.snp.maf.txt. For Indels, see test/testdata/maflite/Patient0.indel.maf.txt
Several additional columns are not created by annotations and must be provided by the user (instructions below). If these are missing, UNKNOWN will appear in the output file.
If you would like to eliminate the UNKNOWN values, you have four options:
1. Create an annotation override file
This will overwrite (or create) values in all variants for the specified annotations. See the
--default_configflag. An example override config file is provided with the program files (
exampleOverrides.config found in the doc/ dir of the source code). Use this when one value should go into the specified annotations for all input variants.
2. Provide the fields as part of the input tsv file
Do this when the annotations change between variants.
3. Use the override flag on the command line
-a flag in the usage information.
4. Specify that your output should be a simple TSV instead of a TCGA MAF
This will put all annotations as column headers and, since no annotations are required, no UNKNOWN values will appear. Use
-o SIMPLE_TSV when calling oncotator. Do this when you want a simple dump of all annotations for all variants.
The default output is a TCGA MAF (version 2.4). The specification can be found at: https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+(MAF)+Specification
If you would prefer a simple tsv as output, just include the
-o SIMPLE_TSV flag when running oncotator.