Service Notice: Due to the blizzard currently hammering the US Northeast, the Broad is shut down and the GATK forum will be mostly unattended while we hunker down and sip hot cocoa with marshmallows. Assuming the power stays on and we're able to dig ourselves out of the snow when it's all over, normal service should resume Wednesday or Thursday.

(howto) Install and run Oncotator for the first time

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,957Administrator, GATK Developer admin
edited December 2014 in Oncotator Documentation

1. Download the Oncotator package, the default datasources package, and (recommended) transcript override list from the Downloads page

Please note: Broadies who wish to run the installed Oncotator on the Broad cluster should follow the instructions here, instead of this page

Oncotator Download


Default Datasource Corpus Download (Dec 11, 2014)

Download 12GB

Please note that this corpus should be used with Oncotator 1.4.x.x and above. Uniprot AA Pos annotations will not function properly with Oncotator 1.3.x.x and below.

Transcript override lists

We highly recommend that you download and use one of the below transcript override lists, especially if clinical applications of Oncotator. When running Oncotator, provide one of the below files with the -c parameter.

  • Download UniProt Exact Match For GENCODE v19, will give selection priority to transcripts with protein sequences that match the UniProt protein sequence exactly. This file can also be found in the Oncotator download at test/testdata/tx_exact_uniprot_matches.txt.

  • Download UniProt Exact Match + Clinical For GENCODE v19, this will give priority to known clinical protein changes. This file is a modification of the UniProt Exact Match (above). For more information about how this list was generated, please see the powerpoint presentation here

The Oncotator and default datasource corpus packages are simple tar files that can be expanded using the following commands:

$ tar zxvf oncotator-
$ tar zxvf oncotator_v1_ds_Dec112014.tar.gz

This will produce two directories called oncotator- and oncotator_v1_ds_Dec112014, respectively. Move to the oncotator- directory by doing:

$ cd oncotator-

2. Set up your Python environment and install dependencies

See the article on platform requirements for a full list of dependencies. This tutorial will show you how to use the virtual environment script we provide to set everything up automagically, and this tutorial will show you how to install dependencies manually if needed (or preferred).

3. Install Oncotator

Once you have installed all the necessary dependencies listed above, simply run the standard Python install script which is included with the Oncotator distribution.

$ python install

Two binaries (executable program files) named oncotator and initializeDatasource respectively will be installed into your Python's bin/ directory. You can test that they were installed by running e.g.:

$ oncotator -h 

to invoke the help / usage instructions. You can also do a test run of Oncotator on the Patient0.snp.maf.txt file provided with the Oncotator distribution (in the test/testdata/maflite/ directory) with the following command:

$ oncotator -v --db-dir=~/sandbox/oncotator/oncotator_v1_ds_June112014 test/testdata/maflite/Patient0.snp.maf.txt exampleOutput.tsv hg19

where you provide the location of the datasources using the --db-dir argument. You may need to adapt the file path for the Patient0.snp.maf.txt file depending on where you run this command from.

This will produce a new file named exampleOutput.tsv with the appropriate annotations, built against the hg19 reference.

Post edited by LeeTL1220 on

Geraldine Van der Auwera, PhD


Sign In or Register to comment.