We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Oncotator overview and basic usage

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited December 2014 in Oncotator documentation


Oncotator is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily intended to be used on human genome variant callsets and we only provide data sources that are relevant to cancer researchers. However, the tool can technically be used to annotate any kind of information onto variant callsets from any organism, and we provide instructions on how to prepare custom data sources for inclusion in the process.


By default Oncotator is set up to use a simple tsv (a.k.a MAFLITE) as input and produces a TCGA MAF as output. See details below.

Oncotator also supports VCF as an input and/or output format.


The input tsv (MAFLITE) file must have the following columns (with column headers):

  • build (at this time the build must be hg19 for all variants)
  • chr
  • start
  • end
  • ref_allele (should be "-" for an insertion)
  • alt_allele (should be "-" for a deletion)

An example input file is provided with the program files. For SNPs, see test/testdata/maflite/Patient0.snp.maf.txt. For Indels, see test/testdata/maflite/Patient0.indel.maf.txt

Several additional columns are not created by annotations and must be provided by the user (instructions below). If these are missing, UNKNOWN will appear in the output file.

  • tumor_barcode
  • normal_barcode
  • NCBI_Build
  • Strand
  • Center
  • source
  • status
  • phase
  • sequencer
  • Tumor_Validation_Allele1
  • Tumor_Validation_Allele2
  • Match_Norm_Validation_Allele1
  • Match_Norm_Validation_Allele2
  • Verification_Status
  • Validation_Status
  • Validation_Method
  • Score
  • BAM_file
  • Match_Norm_Seq_Allele1
  • Match_Norm_Seq_Allele2

If you would like to eliminate the UNKNOWN values, you have four options:

1. Create an annotation override file

This will overwrite (or create) values in all variants for the specified annotations. See the --override_config or --default_configflag. An example override config file is provided with the program files (exampleOverrides.config found in the doc/ dir of the source code). Use this when one value should go into the specified annotations for all input variants.

2. Provide the fields as part of the input tsv file

Do this when the annotations change between variants.

3. Use the override flag on the command line

See the -a flag in the usage information.

4. Specify that your output should be a simple TSV instead of a TCGA MAF

This will put all annotations as column headers and, since no annotations are required, no UNKNOWN values will appear. Use -o SIMPLE_TSV when calling oncotator. Do this when you want a simple dump of all annotations for all variants.


The default output is a TCGA MAF (version 2.4). The specification can be found at: https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+(MAF)+Specification

If you would prefer a simple tsv as output, just include the -o SIMPLE_TSV flag when running oncotator.

Post edited by LeeTL1220 on


Sign In or Register to comment.