The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# (howto) Install and run Oncotator for the first time

edited February 23

Please note that this corpus should be used with Oncotator 1.4.x.x and above. Uniprot AA Pos annotations will not function properly with Oncotator 1.3.x.x and below.

#### Transcript override lists

We highly recommend that you download and use one of the below transcript override lists, especially if clinical applications of Oncotator. When running Oncotator, provide one of the below files with the -c parameter.

• Download UniProt Exact Match For GENCODE v19, will give selection priority to transcripts with protein sequences that match the UniProt protein sequence exactly. This file can also be found in the Oncotator download at test/testdata/tx_exact_uniprot_matches.txt.

• Download UniProt Exact Match + Clinical For GENCODE v19, this will give priority to known clinical protein changes. This file is a modification of the UniProt Exact Match (above). For more information about how this list was generated, please see the powerpoint presentation here

The Oncotator and default datasource corpus packages are simple tar files that can be expanded using the following commands:

$tar zxvf oncotator-1.5.1.0.tar.gz$ tar zxvf oncotator_v1_ds_Jan262015.tar.gz


This will produce two directories called oncotator-1.5.1.0 and oncotator_v1_ds_Dec112014, respectively. Move to the oncotator-1.5.1.0 directory by doing:

$cd oncotator-1.5.1.0  ## 2. Set up your Python environment and install dependencies See the article on platform requirements for a full list of dependencies. This tutorial will show you how to use the virtual environment script we provide to set everything up automagically, and this tutorial will show you how to install dependencies manually if needed (or preferred). ## 3. Install Oncotator Once you have installed all the necessary dependencies listed above, simply run the standard Python install script which is included with the Oncotator distribution. $ python setup.py install


Two binaries (executable program files) named oncotator and initializeDatasource respectively will be installed into your Python's bin/ directory. You can test that they were installed by running e.g.:

$oncotator -h  to invoke the help / usage instructions. You can also do a test run of Oncotator on the Patient0.snp.maf.txt file provided with the Oncotator distribution (in the test/testdata/maflite/ directory) with the following command: $ oncotator -v --db-dir /path/to/oncotator_v1_ds_Jan262015 test/testdata/maflite/Patient0.snp.maf.txt exampleOutput.tsv hg19


where you provide the location of the datasources using the --db-dir argument. You may need to adapt the file path for the Patient0.snp.maf.txt file depending on where you run this command from.

This will produce a new file named exampleOutput.tsv with the appropriate annotations, built against the hg19 reference.

Post edited by Alex_Ramos on

Geraldine Van der Auwera, PhD

Tagged:

• Posts: 12Member

An error occurred while installing Oncotator.

error: Setup script exited with error: command 'gcc' failed with exit status 1

What do i do ?

@pmint, have you looked at the compiler requirements in this document?

Geraldine Van der Auwera, PhD

• Posts: 1Member

An error occurred while installing Oncotator.
Yachtmaster

Geraldine Van der Auwera, PhD

• xiuque88Posts: 2Member

Hi Geraldine,

Thanks you and your team at Broad for providing a local installation of oncotator. Could you provide the checksum for the datasource corpus ? I want to ensure that the download went perfectly.

Thanks!
Cheers.

• xiuque88Posts: 2Member
edited July 2014

Good day,

I was running the test set and a got a series of warnings :

            014-07-17 03:25:13,066 WARNING [oncotator.output.TcgaMafOutputRenderer:197] Entrez Gene ID was zero, but Hugo Symbol was not Unknown.  Is the HGNC and/or Transcript datasource complete?


Cheers

Post edited by xiuque88 on

Geraldine Van der Auwera, PhD

• Posts: 1Member

Hi Geraldine,

I have the same issue. The link specified above is broken. Would you please provide a new one for me? I really appreciate it.

Best,
Joon

Geraldine Van der Auwera, PhD

• ChinaPosts: 1Member

I was install oncotator and got a error: ERROR: Could not load pysam. Some features will be disabled (e.g. COSMIC annotations) and may cause Oncotator to fail. No handlers could be found for logger "root". I used the virtual environment script and the pysam-0.7.5 package have been successfully installed. I was confused why got this error when I test the oncotator. Can you give me some suggestion?

Hi @vguest‌,

Can you post the output that was generated during the installation process?

Geraldine Van der Auwera, PhD

• Posts: 7Member

We love having the local installation of Oncotator because of the improved ease of use and convenience. We do trip up ourselves from time to time when we forgot whether the dozens of output files contain canonical or best effect annotations or that the --tx-mode flag was set to "EFFECT" or not. Can you add as a future enhancement a text string in the output file to label the output as either Canonical or Best Effect? The text string does not have to be in the column headers but wherever it is sensible for you to put it.

Hi @chung2000, we've put this in as a feature request. In the meantime, one way to keep track of this is to check the run logs.

Geraldine Van der Auwera, PhD

• Posts: 7Member

Thank you @Geraldine_VdAuwera. While the run logs are excellent records for when, where, and how the --tx-mode flag was set, all is lost when the Oncotator output files gets passed around to other people in our group or to other organizations because we rather not send the run logs to those people.

@chung2000 I understand, it is awkward to rely on log files. This will be implemented in an upcoming version -- we'll make an announcement when the next release is ready.

Geraldine Van der Auwera, PhD

• Posts: 7Member
edited October 2014

If I want to obtain the latest version of the datasource corpus (more recent than the June 2014 release) where is it located? Thank you.

Post edited by chung2000 on

@chung2000 I don't believe there is a later version than June 2014 as of yet. When there is it will be made available on this page.

Geraldine Van der Auwera, PhD

• Posts: 6Member

Hi, when I run oncotator, I keep getting the error "pkg_resources.DistributionNotFound: natsort"
I am new to command-line, but I have checked that the natsort package is present. Any help would be greaty appreciated. Thanks

• Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

@np3 Try "pip install -U distribute" That has fixed this in the past.

• Posts: 6Member

Thanks for the suggestion. Unfortunately, after doing this step and trying again, I received the same error message.

• Posts: 6Member

We are using Ubuntu 12.04 LTS. Would that make a difference?

• Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

@np3 Can you post the results of "pip freeze" ?

• Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

@np3 I have been using Ubuntu 12.04 with no issues. Also, what version of python are you using?

• Posts: 6Member

Python==2.7.3
Brlapi==0.5.6
Cython==0.21
GnuPGInterface==0.3.2
Jinja2==2.7.1
Mako==0.5.0
MarkupSafe==0.15
MySQL-python==1.2.3
PAM==0.4.2
PIL==1.1.7
Pillow==2.3.0
Pyste==0.9.10
Twisted-Core==11.1.0
Twisted-Names==11.1.0
Twisted-Web==11.1.0
UniConvertor==1.1.4
apt-xapian-index==0.44
apturl==0.5.1ubuntu3
argparse==1.2.1
biopython==1.62
chardet==2.0.1
chimerascan==0.4.5
command-not-found==0.2.44
configglue==1.0
configobj==4.7.2
debtagshw==0.1
decorator==3.3.2
defer==1.0.2
dirspec==3.0.0
duplicity==0.6.18
httplib2==0.7.2
ipython==0.12.1
jockey==0.9.7
keyring==0.9.2
language-selector==0.1
lazr.restfulclient==0.12.0
lazr.uri==1.0.3
lockfile==0.8
louis==2.3.0
lxml==2.3.2
matplotlib==1.1.1rc
natsort==3.5.1
nose==1.1.2
numexpr==1.4.2
numpy==1.6.1
nvidia-common==0.0.0
oauth==1.0.1
onboard==0.97.1
oneconf==0.2.8.1
pandas==0.7.0
pdfshuffler==0.6.0
pexpect==2.3
piston-mini-client==0.7.2
protobuf==2.4.1
pyOpenSSL==0.12
pyPdf==1.13
pycrypto==2.4.1
pycups==1.9.61
pycurl==7.19.0
pyinotify==0.9.2
pyparsing==1.5.2
pysam==0.7.5
pyserial==2.5
pysmbc==1.0.13
pysqlite==1.0.1
python-apt==0.8.3ubuntu7.2
python-dateutil==1.5
python-debian==0.1.21ubuntu1
python-virtkey==0.60.0
pytz==2011k
pyxdg==0.19
pyzmq==2.1.11
reportlab==2.5
rhythmbox-ubuntuone==4.2.0
rpy==1.0.3
scikits.statsmodels==0.3.1
scipy==0.9.0
sessioninstaller==0.0.0
simplegeneric==0.7
simplejson==2.3.2
software-center-aptd-plugins==0.0.0
sympy==0.7.1.rc1
system-service==0.1.6
tables==2.3.1
ubuntuone-couch==0.3.0
ubuntuone-installer==3.0.2
ubuntuone-storage-protocol==3.0.2
ufw==0.31.1-1
unity-lens-video==0.3.5
unity-scope-video-remote==0.3.5
urlgrabber==3.9.1
usb-creator==0.2.23
virtualenv==1.11.6
wsgiref==0.1.2
xdiagnose==2.5.3
xkit==0.0.0
xlrd==0.9.2
zope.interface==3.6.1

• Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

@chung2000@Geraldine_VdAuwera‌ I have posted the Sept 17 corpus here.

• Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

@np3 Though I have newer packages that is very similar to my configuration. Usually the issue here is an incompatible version of distribute or setuptools with the version of python. Are you using a virtual environment? In other words, did you run scripts/create_oncotator_venv.sh? Apologies for basic questions. Also, use @LeeTL1220 to get a faster response.

• Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

@np3 For a different project, I just got the same error as you and I just ran pip uninstall distribute and it worked.

• Posts: 2Member

Hello, I have problems making oncotator work, the first one is pysam 0.7.5, I can install 0.7.8 but for 0.7.5 I get:

NameError: name 'sys_platform' is not defined

/tmp/pip_build_rbarrant/pysam/distribute-0.6.34-py2.7.egg

Traceback (most recent call last):

File "", line 17, in

File "/tmp/pip_build_rbarrant/pysam/setup.py", line 131, in

use_setuptools()


File "distribute_setup.py", line 152, in use_setuptools

return _do_download(version, download_base, to_dir, download_delay)


_build_egg(egg, tarball, to_dir)


File "distribute_setup.py", line 123, in _build_egg

raise IOError('Could not build the egg.')


IOError: Could not build the egg.

Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_rbarrant/pysam
Storing debug log for failure in /users/r/b/rbarrant/.pip/pip.log
bash-3.2$Can things work with 0.7.8 or do I REALLY need 0.7.5? Thanks in advance, Ramiro • Posts: 6Member @LeeTL1220‌ Thanks for the info. I previously used "pip install virtualenv". I now tried your suggestion for installing the VE. I also uninstalled distribute before and after doing so, and also reinstalled it at the end. I could not get oncotator to work at any of these steps. I should say I am very new to Linux, so any basic advice is appreciated. Btw, I used the command "./oncotator -h" from the "bin" directory. Is this correct? The next step would be to try to use the new Corpus. To do so, do I need to uninstall/remove the old one? Thanks for your help. • Posts: 7,169Administrator, GATK Developer admin Geraldine Van der Auwera, PhD • Posts: 6Member @Geraldine_VdAuwera‌ Sorry for the late reply. Yes, that is guide I used. Thanks. • Posts: 7Member @np3, if your installation is like mine, you would use the the command "./oncotator -h" from the "oncotator-1.3.0.0" directory. Also, you don't need to remove the old Corpus unless you want to recover the disk space that it is occupying. • ParisPosts: 1Member Hello, I have an error while using VCF file : ValueError: could not convert string to float: I have install the patched version of PyVCF. Regards • Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer @JeremyS‌ Did you use the create_oncotator_venv.sh script to create a virtual environment and run oncotator from there? • Posts: 4Member Having spent a few days making Oncotator to work, I want to share my solution. I think the main problem stems from the fact that Oncotator uses distribute-0.6.15 which does not understand wheel format (new Python packaging). Therefore if you happen to have some of the required packages installed as dist-info rather than egg-info you are likely to run into trouble (check your lib/python2.7/site-packages/ ) We are running SL 6.4 on our cluster so no Python 2.7 in the distro and we are using modules. 1/ installed python/2.7.9; created a module for python/2.7.9 2/ installed numpy and scipy with Intel MKL support; after that ~> module load intel-mkl/11.0u1 python/2.7.9 ~> pip freeze Cython==0.21.2 h5py==2.4.0 nose==1.3.4 numpy==1.9.1 scipy==0.15.1 virtualenv==12.0.5 wsgiref==0.1.2  This is a minimal install. Uninstall nose before proceeding further (nose was needed for testing numpy) pip uninstall nose  After that you could proceed with virtualenv setup as recommended elsewhere. But we prefer modules. 3/ created oncotator module containing this set base /apps/oncotator/1.4.1.0 prepend-path PYTHONPATH$base/lib/python2.7/site-packages
prepend-path PATH $base/bin  4/ downloaded and unpacked oncotator 1.4.1.0 5/ adapted oncotator-1.4.1.0/scripts/create_oncotator_venv.sh to make my own script for installing prerequisites #!/bin/bash module load intel-mkl/11.0u1 python/2.7.9 oncotator/1.4.1.0 TARGET=/apps/oncotator/1.4.1.0 #C_PACKAGE_LIST="biopython cython numpy pandas sqlalchemy" C_PACKAGE_LIST="biopython pandas sqlalchemy" for C_PACKAGE in$C_PACKAGE_LIST; do
pip install --install-option="--prefix=$TARGET" --no-use-wheel$C_PACKAGE
done
PACKAGE_LIST="bcbio-gff nose shove python-memcached natsort more-itertools enum34"
for PACKAGE in $PACKAGE_LIST; do pip install --install-option="--prefix=$TARGET" -U --no-use-wheel $PACKAGE done wget https://github.com/elephanthunter/PyVCF/archive/master.zip mv master master.zip unzip master.zip cd PyVCF-master python2.7 setup.py install --prefix=$TARGET
cd ..


cython and numpy are removed from C_PACKAGE_LIST because we have them already. Should not be any harm if there are not removed.

6/ installed oncotator itself

python2.7 setup.py install --prefix=/apps/oncotator/1.4.1.0


module load intel-mkl/11.0u1 python/2.7.9 oncotator/1.4.1.0


should make oncotator available on the command line.

8/ it should be also fine to do the following

module unload oncotator/1.4.1.0
pip install nose


i.e. install nose back into the new python because nose is such an essential package. Oncotator should still be happy with its nose egg install.

@ink, thanks for reporting your solution.

Geraldine Van der Auwera, PhD