Can I create my own datasources for Oncotator?

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited May 2014 in Oncotator documentation

Yes, you can create your own datasources.

The simplest is to start from tsv (tab-separated value) files in which the annotations are based on gene, genomic position, or transcript ID. You can also use VCF files. For transcript data, only gtfs are supported, not tsvs.


Input tsv file format requirements

- Have one or more index columns. For gene or transcript IDs, this will be one column that contains the Hugo symbol or GAF transcript ID. For genomic positions, this will be the three columns that correspond to chromosome, start, and end.
- Have column names on the first line
- Each value (or set of values) in the index column(s) can only appear in one row
- No rows prepended with "#"
- Name of the index column(s) as appearing in the tsv. For example, the gene index column might be called "Symbol" in the tsv file.
- Name of the datasource as it should appear in version strings and annotation headers
- Path to the destination parent directory
- Name of the destination directory
- Version number

See this tutorial for step-by-step instructions.

In addition to this tsv-based option, you can also create datasources from Tabix VCFs and Tabix TSV. Please see the initializeDatasource help / usage documentation for more information.

Post edited by Geraldine_VdAuwera on
Sign In or Register to comment.