What input files can I annotate with Oncotator?

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited July 2015 in Oncotator documentation

Input formats supported by Oncotator

  • VCF -- As seen in the version 4.1 http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

  • MAFLITE -- maflite, which is a generic tab separated values file. The following columns must be present (though there is an aliasing mechanism in Oncotator that will automatically recognize some obvious synonyms):

    chr -- contig name

    start -- start position. For inserts, this is the base preceding the insert. For deletions, this is the first base that is removed.

    end -- end position. For inserts, this is the base immediately after the insertion. In other words, this is start + 1.

    ref_allele -- the reference allele. For insertions, this should be "-"

    alt_allele -- the alternate allele. For deletions, this should be "-"

All other columns in the maflite input will be treated as annotations. Column order does not matter.

For TCGA MAF input, use MAFLITE as the input type.

TCGA MAF files created by Oncotator

You can use an annotated TCGA MAF file generated with Oncotator as an input to Oncotator.

However, there is a caveat: if the input file is a MAF generated by Oncotator 0.5.x.x or earlier, the columns may get reordered. Additionally, if the input was generated by any earlier version of Oncotator, there is a possibility that columns may change whether marked as internal (i.e. the "i_" prepend may be added or removed).

There are several reasons why you may want to do this. The most illustrative example is when you have generated a very large annotated MAF file and a new datasource is added. Rather than rerun Oncotator and re-generate the large annotated MAF file, you can use the large MAF file as input to a run of Oncotator configured with only the new datasource.

The input format should be MAFLITE and the output format should be TCGAMAF.

If you wish to reannotate an input TCGA MAF, use -i TCGAMAF. This will overwrite old values with new ones. Oncotator 1.8.x.x and above required.

When annotating an input TCGA MAF, if you see a DuplicateAnnotationException...

As of Oncotator 1.8.x.x, you can directly reannotate a TCGA MAF using -i TCGAMAF. This is preferable to the instructions below.

This happens when the input file and a datasource are trying to write different values for the same annotation.

You can use an annotated TCGA MAF file generated with Oncotator as an input to Oncotator, but you will need to preserve the following columns:

Chromosome, Start_position, End_position, ref_allele, alt_allele, Tumor_Sample_Barcode, Matched_Norm_Sample_Barcode, Tumor_Sample_UUID, Matched_Norm_Sample_UUID

The following cut command will extract those columns into a new MAFLITE file:

cut -f 5,6,7,11,13,16,17,33,34 my_maf_file.maf.annotated

Additionally, if you are running on the Broad cluster, you will want to add the following option to your oncotator call:

Post edited by LeeTL1220 on


  • varshavarsha FloridaMember

    Hi @Geraldine_VdAuwera, I am able to run oncotator with maf format file but not with vcf (output from MuTect) for some reason. Please let me know if the comman for vcf I am giving is right -
    oncotator -i VCF --db-dir /path/to/oncotator_v1_ds_Jan262014 -o VCF oncotest.vcf OncoTestOutput.vcf hg19

    The error I am getting is
    Error: AttributeError: 'str' object has no attribute 'type'
    2015-09-11 11:41:54,807 ERROR [oncotator.output.VcfOutputRenderer:215] Error at mutation 2 ['1', '6253360', '6253360']:

  • SyedSyed IndiaMember

    Were you able to get answer for this one ?

Sign In or Register to comment.