If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to make Oncotator run faster (for users)

LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭
edited June 2014 in Oncotator documentation

There are cases where the annotation speed of Oncotator may not be fast enough for a user's needs. Here are several tips for speeding up Oncotator:

Only use the datasources that you need

Oncotator has overhead for every annotation that it renders. Oncotator honors symlinks in the db-dir. You can create db dirs that have a subset of the datasources by creating a new directory and adding symlinks.

For example, if your default datasource corpus is located in ${OLD_DB_DIR}:

# Create a new db directory and only populate it with ref_hg and dbNSFP
mkdir -p ${NEW_DB_DIR}
ln -s ${OLD_DB_DIR}/ref_hg  ${NEW_DB_DIR}/ref_hg
ln -s ${OLD_DB_DIR}/dbNSFP  ${NEW_DB_DIR}/dbNSFP
# Running oncotator
oncotator ... --db-dir ${NEW_DB_DIR} ...

In the future, specifying the datasources from the command line will be available, but that has not been implemented yet.

Output as SIMPLE_TSV

If you need a very simple tab separated values list, use -o SIMPLE_TSV This will produce output faster than VCF or TCGA MAF.

Use --skip-no-alt for VCF input and non-VCF output

If you have VCF input with a genotype field AND you are not interested in rendering the GT=0/0 variants (usually the case for -o TCGAMAF), use --skip-no-alt. This often greatly reduces the amount of variants that will be rendered in a VCF that has a lot of samples.

Use a cache

If your file system is fast enough, consider using -u file://.... This can save time when annotating with a lot of the larger datasources (e.g. dbNSFP, GENCODE). If you have a memcache server available, use -u memcache://...

See oncotator --help for examples.

Post edited by LeeTL1220 on


Sign In or Register to comment.