How to make Oncotator run faster (for users)

LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭
edited June 2014 in Oncotator documentation

There are cases where the annotation speed of Oncotator may not be fast enough for a user's needs. Here are several tips for speeding up Oncotator:

Only use the datasources that you need

Oncotator has overhead for every annotation that it renders. Oncotator honors symlinks in the db-dir. You can create db dirs that have a subset of the datasources by creating a new directory and adding symlinks.

For example, if your default datasource corpus is located in ${OLD_DB_DIR}:

# Create a new db directory and only populate it with ref_hg and dbNSFP
mkdir -p ${NEW_DB_DIR}
ln -s ${OLD_DB_DIR}/ref_hg  ${NEW_DB_DIR}/ref_hg
ln -s ${OLD_DB_DIR}/dbNSFP  ${NEW_DB_DIR}/dbNSFP
# Running oncotator
oncotator ... --db-dir ${NEW_DB_DIR} ...

In the future, specifying the datasources from the command line will be available, but that has not been implemented yet.

Output as SIMPLE_TSV

If you need a very simple tab separated values list, use -o SIMPLE_TSV This will produce output faster than VCF or TCGA MAF.

Use --skip-no-alt for VCF input and non-VCF output

If you have VCF input with a genotype field AND you are not interested in rendering the GT=0/0 variants (usually the case for -o TCGAMAF), use --skip-no-alt. This often greatly reduces the amount of variants that will be rendered in a VCF that has a lot of samples.

Use a cache

If your file system is fast enough, consider using -u file://.... This can save time when annotating with a lot of the larger datasources (e.g. dbNSFP, GENCODE). If you have a memcache server available, use -u memcache://...

See oncotator --help for examples.

Post edited by LeeTL1220 on


Sign In or Register to comment.