We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How to make Oncotator run faster (for users)

LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭
edited June 2014 in Oncotator documentation

There are cases where the annotation speed of Oncotator may not be fast enough for a user's needs. Here are several tips for speeding up Oncotator:

Only use the datasources that you need

Oncotator has overhead for every annotation that it renders. Oncotator honors symlinks in the db-dir. You can create db dirs that have a subset of the datasources by creating a new directory and adding symlinks.

For example, if your default datasource corpus is located in ${OLD_DB_DIR}:

# Create a new db directory and only populate it with ref_hg and dbNSFP
mkdir -p ${NEW_DB_DIR}
ln -s ${OLD_DB_DIR}/ref_hg  ${NEW_DB_DIR}/ref_hg
ln -s ${OLD_DB_DIR}/dbNSFP  ${NEW_DB_DIR}/dbNSFP
# Running oncotator
oncotator ... --db-dir ${NEW_DB_DIR} ...

In the future, specifying the datasources from the command line will be available, but that has not been implemented yet.

Output as SIMPLE_TSV

If you need a very simple tab separated values list, use -o SIMPLE_TSV This will produce output faster than VCF or TCGA MAF.

Use --skip-no-alt for VCF input and non-VCF output

If you have VCF input with a genotype field AND you are not interested in rendering the GT=0/0 variants (usually the case for -o TCGAMAF), use --skip-no-alt. This often greatly reduces the amount of variants that will be rendered in a VCF that has a lot of samples.

Use a cache

If your file system is fast enough, consider using -u file://.... This can save time when annotating with a lot of the larger datasources (e.g. dbNSFP, GENCODE). If you have a memcache server available, use -u memcache://...

See oncotator --help for examples.

Post edited by LeeTL1220 on


Sign In or Register to comment.