What input files can I annotate with Oncotator?
Input formats supported by Oncotator
VCF -- As seen in the version 4.1 http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
MAFLITE -- maflite, which is a generic tab separated values file. The following columns must be present (though there is an aliasing mechanism in Oncotator that will automatically recognize some obvious synonyms):
chr -- contig name
start -- start position. For inserts, this is the base preceding the insert. For deletions, this is the first base that is removed.
end -- end position. For inserts, this is the base immediately after the insertion. In other words, this is start + 1.
ref_allele -- the reference allele. For insertions, this should be "-"
alt_allele -- the alternate allele. For deletions, this should be "-"
All other columns in the maflite input will be treated as annotations. Column order does not matter.
For TCGA MAF input, use MAFLITE as the input type.
TCGA MAF files created by Oncotator
You can use an annotated TCGA MAF file generated with Oncotator as an input to Oncotator.
However, there is a caveat: if the input file is a MAF generated by Oncotator 0.5.x.x or earlier, the columns may get reordered. Additionally, if the input was generated by any earlier version of Oncotator, there is a possibility that columns may change whether marked as internal (i.e. the "i_" prepend may be added or removed).
There are several reasons why you may want to do this. The most illustrative example is when you have generated a very large annotated MAF file and a new datasource is added. Rather than rerun Oncotator and re-generate the large annotated MAF file, you can use the large MAF file as input to a run of Oncotator configured with only the new datasource.
The input format should be
MAFLITE and the output format should be
If you wish to reannotate an input TCGA MAF, use -i TCGAMAF. This will overwrite old values with new ones. Oncotator 1.8.x.x and above required.
When annotating an input TCGA MAF, if you see a DuplicateAnnotationException...
As of Oncotator 1.8.x.x, you can directly reannotate a TCGA MAF using -i TCGAMAF. This is preferable to the instructions below.
This happens when the input file and a datasource are trying to write different values for the same annotation.
You can use an annotated TCGA MAF file generated with Oncotator as an input to Oncotator, but you will need to preserve the following columns:
Chromosome, Start_position, End_position, ref_allele, alt_allele, Tumor_Sample_Barcode, Matched_Norm_Sample_Barcode, Tumor_Sample_UUID, Matched_Norm_Sample_UUID
The following cut command will extract those columns into a new MAFLITE file:
cut -f 5,6,7,11,13,16,17,33,34 my_maf_file.maf.annotated
Additionally, if you are running on the Broad cluster, you will want to add the following option to your oncotator call: