Oncotator data sources
For human / cancer-related research applications, we aggregate annotations from the following resources listed below. Please note that some of these may not be publicly available. We provide a bundle of publicly available resources on our Downloads page.
Important note: these datasources now use GENCODE v19 instead of GAF 3.0 for transcript annotations, such as 'gene'. The current version of Oncotator is backward compatible with previous datasources, but moving forward, we cannot guarantee that will be the case for future versions.
- Gene, transcript, and functional consequence annotations using GENCODE hg19 reference set. Both basic transcripts and long noncoding RNA are provided.
- Common SNP annotations from dbSNP (includes data from 1000 Genomes project pilot 1, 2, and 3 studies), ESP, and 1000G
- HGVS Nomenclature support for GENCODE v19+/ENSEMBL transcripts.
- Sequence Ontology terms
- Site-specific protein annotations from UniProt
- Druggable target data from DrugBank
- Functional impact predictions from dbNSFP, which includes PolyPhen-2, SIFT, MutationAssessor, LRT, FATHMM, and more.
- Observed cancer mutation frequency annotations from COSMIC
- Cancer gene and mutation annotations from the Cancer Gene Census
- Significant amplification/deletion region annotations from Tumorscape and the TCGA Copy Number Portal
- Overlapping Oncomap mutations from the Cancer Cell Line Encyclopedia
- Significantly mutated gene annotations aggregated from published MutSig analyses
- Cancer gene annotations from the Familial Cancer Database
- Human DNA Repair Gene annotations from Wood et al.