If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Can I create my own datasources for Oncotator?
Yes, you can create your own datasources.
The simplest is to start from tsv (tab-separated value) files in which the annotations are based on gene, genomic position, or transcript ID. You can also use VCF files. For transcript data, only gtfs are supported, not tsvs.
Input tsv file format requirements
- Have one or more index columns. For gene or transcript IDs, this will be one column that contains the Hugo symbol or GAF transcript ID. For genomic positions, this will be the three columns that correspond to chromosome, start, and end.
- Have column names on the first line
- Each value (or set of values) in the index column(s) can only appear in one row
- No rows prepended with "#"
- Name of the index column(s) as appearing in the tsv. For example, the gene index column might be called "Symbol" in the tsv file.
- Name of the datasource as it should appear in version strings and annotation headers
- Path to the destination parent directory
- Name of the destination directory
- Version number
See this tutorial for step-by-step instructions.
In addition to this tsv-based option, you can also create datasources from Tabix VCFs and Tabix TSV. Please see the initializeDatasource help / usage documentation for more information.