GATK licensing moves to direct-through-Broad model -- read about it on the GATK blog

(howto) Install and run Oncotator for the first time

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin
edited February 23 in Oncotator Documentation

1. Download the Oncotator package, the default datasources package, and (recommended) transcript override list from the Downloads page

Please note: Broadies who wish to run the installed Oncotator on the Broad cluster should follow the instructions here, instead of this page

Oncotator Download

Download

Default Datasource Corpus Download (January 26, 2015)

Download 14GB

Please note that this corpus should be used with Oncotator 1.4.x.x and above. Uniprot AA Pos annotations will not function properly with Oncotator 1.3.x.x and below.

Transcript override lists

We highly recommend that you download and use one of the below transcript override lists, especially if clinical applications of Oncotator. When running Oncotator, provide one of the below files with the -c parameter.

  • Download UniProt Exact Match For GENCODE v19, will give selection priority to transcripts with protein sequences that match the UniProt protein sequence exactly. This file can also be found in the Oncotator download at test/testdata/tx_exact_uniprot_matches.txt.

  • Download UniProt Exact Match + Clinical For GENCODE v19, this will give priority to known clinical protein changes. This file is a modification of the UniProt Exact Match (above). For more information about how this list was generated, please see the powerpoint presentation here

The Oncotator and default datasource corpus packages are simple tar files that can be expanded using the following commands:

$ tar zxvf oncotator-1.5.1.0.tar.gz
$ tar zxvf oncotator_v1_ds_Jan262015.tar.gz

This will produce two directories called oncotator-1.5.1.0 and oncotator_v1_ds_Dec112014, respectively. Move to the oncotator-1.5.1.0 directory by doing:

$ cd oncotator-1.5.1.0

2. Set up your Python environment and install dependencies

See the article on platform requirements for a full list of dependencies. This tutorial will show you how to use the virtual environment script we provide to set everything up automagically, and this tutorial will show you how to install dependencies manually if needed (or preferred).


3. Install Oncotator

Once you have installed all the necessary dependencies listed above, simply run the standard Python install script which is included with the Oncotator distribution.

$ python setup.py install

Two binaries (executable program files) named oncotator and initializeDatasource respectively will be installed into your Python's bin/ directory. You can test that they were installed by running e.g.:

$ oncotator -h 

to invoke the help / usage instructions. You can also do a test run of Oncotator on the Patient0.snp.maf.txt file provided with the Oncotator distribution (in the test/testdata/maflite/ directory) with the following command:

$ oncotator -v --db-dir /path/to/oncotator_v1_ds_Jan262015 test/testdata/maflite/Patient0.snp.maf.txt exampleOutput.tsv hg19

where you provide the location of the datasources using the --db-dir argument. You may need to adapt the file path for the Patient0.snp.maf.txt file depending on where you run this command from.

This will produce a new file named exampleOutput.tsv with the appropriate annotations, built against the hg19 reference.

Post edited by Alex_Ramos on

Geraldine Van der Auwera, PhD

Comments

  • pmintpmint Posts: 12Member

    An error occurred while installing Oncotator.

    error: Setup script exited with error: command 'gcc' failed with exit status 1

    What do i do ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    @pmint, have you looked at the compiler requirements in this document?

    Geraldine Van der Auwera, PhD

  • szhwatchszhwatch Posts: 1Member

    An error occurred while installing Oncotator.
    Yachtmaster

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    @szhwatch, we're going to need a little more information than that to help you. What did you try, what happened, ...?

    Geraldine Van der Auwera, PhD

  • xiuque88xiuque88 xiuque88Posts: 2Member

    Hi Geraldine,

    Thanks you and your team at Broad for providing a local installation of oncotator. Could you provide the checksum for the datasource corpus ? I want to ensure that the download went perfectly.

    Thanks!
    Cheers.

  • xiuque88xiuque88 xiuque88Posts: 2Member
    edited July 2014

    Good day,

    I was running the test set and a got a series of warnings :

                014-07-17`` 03:25:13,066 WARNING [oncotator.output.TcgaMafOutputRenderer:197] Entrez Gene ID was zero, but Hugo Symbol was not Unknown.  Is the HGNC and/or Transcript datasource complete?
    

    Could you please advice if this is expected?

    Cheers

    Post edited by xiuque88 on
  • joonlee3joonlee3 Posts: 1Member

    Hi Geraldine,

    I have the same issue. The link specified above is broken. Would you please provide a new one for me? I really appreciate it.

    Best,
    Joon

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin
  • vguestvguest ChinaPosts: 1Member

    I was install oncotator and got a error: ERROR: Could not load pysam. Some features will be disabled (e.g. COSMIC annotations) and may cause Oncotator to fail. No handlers could be found for logger "root". I used the virtual environment script and the pysam-0.7.5 package have been successfully installed. I was confused why got this error when I test the oncotator. Can you give me some suggestion?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    Hi @vguest‌,

    Can you post the output that was generated during the installation process?

    Geraldine Van der Auwera, PhD

  • chung2000chung2000 Posts: 8Member

    We love having the local installation of Oncotator because of the improved ease of use and convenience. We do trip up ourselves from time to time when we forgot whether the dozens of output files contain canonical or best effect annotations or that the --tx-mode flag was set to "EFFECT" or not. Can you add as a future enhancement a text string in the output file to label the output as either Canonical or Best Effect? The text string does not have to be in the column headers but wherever it is sensible for you to put it.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    Hi @chung2000, we've put this in as a feature request. In the meantime, one way to keep track of this is to check the run logs.

    Geraldine Van der Auwera, PhD

  • chung2000chung2000 Posts: 8Member

    Thank you @Geraldine_VdAuwera. While the run logs are excellent records for when, where, and how the --tx-mode flag was set, all is lost when the Oncotator output files gets passed around to other people in our group or to other organizations because we rather not send the run logs to those people.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    @chung2000 I understand, it is awkward to rely on log files. This will be implemented in an upcoming version -- we'll make an announcement when the next release is ready.

    Geraldine Van der Auwera, PhD

  • chung2000chung2000 Posts: 8Member
    edited October 2014

    If I want to obtain the latest version of the datasource corpus (more recent than the June 2014 release) where is it located? Thank you.

    Post edited by chung2000 on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    @chung2000 I don't believe there is a later version than June 2014 as of yet. When there is it will be made available on this page.

    Geraldine Van der Auwera, PhD

  • np3np3 Posts: 6Member

    Hi, when I run oncotator, I keep getting the error "pkg_resources.DistributionNotFound: natsort"
    I am new to command-line, but I have checked that the natsort package is present. Any help would be greaty appreciated. Thanks

  • LeeTL1220LeeTL1220 Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

    @np3 Try "pip install -U distribute" That has fixed this in the past.

  • np3np3 Posts: 6Member

    Thanks for the suggestion. Unfortunately, after doing this step and trying again, I received the same error message.

  • np3np3 Posts: 6Member

    We are using Ubuntu 12.04 LTS. Would that make a difference?

  • LeeTL1220LeeTL1220 Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

    @np3 Can you post the results of "pip freeze" ?

  • LeeTL1220LeeTL1220 Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

    @np3 I have been using Ubuntu 12.04 with no issues. Also, what version of python are you using?

  • np3np3 Posts: 6Member

    Python==2.7.3
    Brlapi==0.5.6
    Cython==0.21
    GnuPGInterface==0.3.2
    Jinja2==2.7.1
    Mako==0.5.0
    MarkupSafe==0.15
    MySQL-python==1.2.3
    PAM==0.4.2
    PIL==1.1.7
    Pillow==2.3.0
    Pyste==0.9.10
    Twisted-Core==11.1.0
    Twisted-Names==11.1.0
    Twisted-Web==11.1.0
    UniConvertor==1.1.4
    adium-theme-ubuntu==0.3.2
    apt-xapian-index==0.44
    apturl==0.5.1ubuntu3
    argparse==1.2.1
    biopython==1.62
    chardet==2.0.1
    chimerascan==0.4.5
    classicmenu-indicator==0.09
    command-not-found==0.2.44
    configglue==1.0
    configobj==4.7.2
    debtagshw==0.1
    decorator==3.3.2
    defer==1.0.2
    dirspec==3.0.0
    duplicity==0.6.18
    httplib2==0.7.2
    ipython==0.12.1
    jockey==0.9.7
    keyring==0.9.2
    language-selector==0.1
    launchpadlib==1.9.12
    lazr.restfulclient==0.12.0
    lazr.uri==1.0.3
    lockfile==0.8
    louis==2.3.0
    lxml==2.3.2
    matplotlib==1.1.1rc
    natsort==3.5.1
    nose==1.1.2
    numexpr==1.4.2
    numpy==1.6.1
    nvidia-common==0.0.0
    oauth==1.0.1
    onboard==0.97.1
    oneconf==0.2.8.1
    pandas==0.7.0
    pdfshuffler==0.6.0
    pexpect==2.3
    piston-mini-client==0.7.2
    protobuf==2.4.1
    pyOpenSSL==0.12
    pyPdf==1.13
    pycrypto==2.4.1
    pycups==1.9.61
    pycurl==7.19.0
    pyinotify==0.9.2
    pyparsing==1.5.2
    pysam==0.7.5
    pyserial==2.5
    pysmbc==1.0.13
    pysqlite==1.0.1
    python-apt==0.8.3ubuntu7.2
    python-dateutil==1.5
    python-debian==0.1.21ubuntu1
    python-virtkey==0.60.0
    pytz==2011k
    pyxdg==0.19
    pyzmq==2.1.11
    reportlab==2.5
    rhythmbox-ubuntuone==4.2.0
    rpy==1.0.3
    scikits.statsmodels==0.3.1
    scipy==0.9.0
    sessioninstaller==0.0.0
    simplegeneric==0.7
    simplejson==2.3.2
    software-center-aptd-plugins==0.0.0
    sympy==0.7.1.rc1
    system-service==0.1.6
    tables==2.3.1
    tornado==2.1
    ubuntuone-couch==0.3.0
    ubuntuone-installer==3.0.2
    ubuntuone-storage-protocol==3.0.2
    ufw==0.31.1-1
    unattended-upgrades==0.1
    unity-lens-video==0.3.5
    unity-scope-video-remote==0.3.5
    urlgrabber==3.9.1
    usb-creator==0.2.23
    virtualenv==1.11.6
    wadllib==1.3.0
    wsgiref==0.1.2
    xdiagnose==2.5.3
    xkit==0.0.0
    xlrd==0.9.2
    yum-metadata-parser==1.1.2
    zope.interface==3.6.1

  • LeeTL1220LeeTL1220 Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

    @chung2000@Geraldine_VdAuwera‌ I have posted the Sept 17 corpus here.

  • LeeTL1220LeeTL1220 Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

    @np3 Though I have newer packages that is very similar to my configuration. Usually the issue here is an incompatible version of distribute or setuptools with the version of python. Are you using a virtual environment? In other words, did you run scripts/create_oncotator_venv.sh? Apologies for basic questions. Also, use @LeeTL1220 to get a faster response.

  • LeeTL1220LeeTL1220 Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

    @np3 For a different project, I just got the same error as you and I just ran pip uninstall distribute and it worked.

  • ramiro2kramiro2k Posts: 2Member

    Hello, I have problems making oncotator work, the first one is pysam 0.7.5, I can install 0.7.8 but for 0.7.5 I get:

    NameError: name 'sys_platform' is not defined

    /tmp/pip_build_rbarrant/pysam/distribute-0.6.34-py2.7.egg

    Traceback (most recent call last):

    File "", line 17, in

    File "/tmp/pip_build_rbarrant/pysam/setup.py", line 131, in

    use_setuptools()
    

    File "distribute_setup.py", line 152, in use_setuptools

    return _do_download(version, download_base, to_dir, download_delay)
    

    File "distribute_setup.py", line 132, in _do_download

    _build_egg(egg, tarball, to_dir)
    

    File "distribute_setup.py", line 123, in _build_egg

    raise IOError('Could not build the egg.')
    

    IOError: Could not build the egg.


    Cleaning up...
    Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_rbarrant/pysam
    Storing debug log for failure in /users/r/b/rbarrant/.pip/pip.log
    bash-3.2$

    Can things work with 0.7.8 or do I REALLY need 0.7.5?

    Thanks in advance,
    Ramiro

  • np3np3 Posts: 6Member

    @LeeTL1220‌ Thanks for the info. I previously used "pip install virtualenv". I now tried your suggestion for installing the VE. I also uninstalled distribute before and after doing so, and also reinstalled it at the end. I could not get oncotator to work at any of these steps. I should say I am very new to Linux, so any basic advice is appreciated.

    Btw, I used the command "./oncotator -h" from the "bin" directory. Is this correct?

    The next step would be to try to use the new Corpus. To do so, do I need to uninstall/remove the old one?

    Thanks for your help.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    Geraldine Van der Auwera, PhD

  • np3np3 Posts: 6Member

    @Geraldine_VdAuwera‌ Sorry for the late reply. Yes, that is guide I used. Thanks.

  • chung2000chung2000 Posts: 8Member

    @np3, if your installation is like mine, you would use the the command "./oncotator -h" from the "oncotator-1.3.0.0" directory.

    Also, you don't need to remove the old Corpus unless you want to recover the disk space that it is occupying.

  • JeremySJeremyS ParisPosts: 1Member

    Hello,
    I have an error while using VCF file : ValueError: could not convert string to float:
    I have install the patched version of PyVCF.

    Regards

  • LeeTL1220LeeTL1220 Jamaica Plain, MAPosts: 18Member, Broadie, Cancer Tools Developer

    @JeremyS‌ Did you use the create_oncotator_venv.sh script to create a virtual environment and run oncotator from there?

  • inkink Posts: 4Member

    Having spent a few days making Oncotator to work, I want to share my solution. I think the main problem stems from the fact that Oncotator uses distribute-0.6.15 which does not understand wheel format (new Python packaging). Therefore if you happen to have some of the required packages installed as dist-info rather than egg-info you are likely to run into trouble (check your lib/python2.7/site-packages/ )

    We are running SL 6.4 on our cluster so no Python 2.7 in the distro and we are using modules.
    1/ installed python/2.7.9; created a module for python/2.7.9
    2/ installed numpy and scipy with Intel MKL support; after that

    ~> module load intel-mkl/11.0u1 python/2.7.9
    ~> pip freeze
    Cython==0.21.2
    h5py==2.4.0
    nose==1.3.4
    numpy==1.9.1
    scipy==0.15.1
    virtualenv==12.0.5
    wsgiref==0.1.2
    

    This is a minimal install. Uninstall nose before proceeding further (nose was needed for testing numpy)

    pip uninstall nose
    

    After that you could proceed with virtualenv setup as recommended elsewhere. But we prefer modules.

    3/ created oncotator module containing this

    set base /apps/oncotator/1.4.1.0
    prepend-path PYTHONPATH $base/lib/python2.7/site-packages
    prepend-path PATH $base/bin
    

    4/ downloaded and unpacked oncotator 1.4.1.0

    5/ adapted oncotator-1.4.1.0/scripts/create_oncotator_venv.sh to make my own script for installing prerequisites

    #!/bin/bash
    module load intel-mkl/11.0u1 python/2.7.9 oncotator/1.4.1.0
    TARGET=/apps/oncotator/1.4.1.0
    #C_PACKAGE_LIST="biopython cython numpy pandas sqlalchemy"
    C_PACKAGE_LIST="biopython pandas sqlalchemy"
    for C_PACKAGE in $C_PACKAGE_LIST; do
        pip install --install-option="--prefix=$TARGET" --no-use-wheel $C_PACKAGE
    done
    PACKAGE_LIST="bcbio-gff nose shove python-memcached natsort more-itertools enum34"
    for PACKAGE in $PACKAGE_LIST; do
        pip install --install-option="--prefix=$TARGET" -U --no-use-wheel $PACKAGE
    done
    wget https://github.com/elephanthunter/PyVCF/archive/master.zip
    mv master master.zip
    unzip master.zip 
    cd PyVCF-master 
    python2.7 setup.py install --prefix=$TARGET
    cd ..
    

    cython and numpy are removed from C_PACKAGE_LIST because we have them already. Should not be any harm if there are not removed.

    6/ installed oncotator itself

    python2.7 setup.py install --prefix=/apps/oncotator/1.4.1.0
    

    7/ after that loading

    module load intel-mkl/11.0u1 python/2.7.9 oncotator/1.4.1.0
    

    should make oncotator available on the command line.

    8/ it should be also fine to do the following

    module unload oncotator/1.4.1.0
    pip install nose
    

    i.e. install nose back into the new python because nose is such an essential package. Oncotator should still be happy with its nose egg install.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    @ink, thanks for reporting your solution.

    Geraldine Van der Auwera, PhD

  • woodydonwoodydon TaipeiPosts: 17Member

    I unzipped oncotator_v1_ds_Jan262015.tar.gz and found that the folder name was oncotator_v1_ds_Jan262014 instead of oncotator_v1_ds_Jan262015. Was it simply a typo?

  • chung2000chung2000 Posts: 8Member

    @woodydon said:

    I unzipped oncotator_v1_ds_Jan262015.tar.gz and found that the folder name was oncotator_v1_ds_Jan262014 instead of oncotator_v1_ds_Jan262015. Was it simply a typo?

    I found the same thing but was able to run Oncotator with the new corpus to get the upgraded 1000 genome and dbDNP data. Most likely the folder name had a typo, as is usually the case at the beginning of any year.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    Yes that looks like a typo, sorry about that.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.