Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What is the difference between tx-mode BEST EFFECT vs. CANONICAL?

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited December 2014 in Oncotator documentation

In GENCODE, each transcript has a level of curation. You can choose how Oncotator evaluates the level of curation and the variant classification score to decide what transcript annotation to emit by setting tx-mode to either CANONICAL or BEST EFFECT.

GENCODE CANONICAL (Oncotator v1.4.x.x and above)

  • Choose the transcript that is on the custom list specified by the user (-c option). If no list was specified, treat as if all transcripts were on the list (tie).
  • In case of tie, choose the transcript with highest level of curation. Note that this means lower number is better for level (see below).
  • If still a tie, choose the transcript that yields the variant classification highest on the variant classification rank list (see below).
  • If still a tie, choose the transcript with the best appris annotation (see below).
  • If still a tie, choose the transcript with the longest transcript sequence length.
  • If still a tie, choose the first transcript, alphabetically.

GENCODE BEST_EFFECT (Oncotator v1.4.x.x and below)

  • Choose the transcript that is on the custom list specified by the user (-c option). If no list was specified, treat as if no transcripts were on the list (tie).
  • In case of tie, choose the transcript that yields the variant classification highest on the variant classification rank list (see below).
  • If still a tie, choose the transcript with highest level of curation. Note that this means lower number is better for level (see below).
  • If still a tie, choose the transcript with the best appris annotation (see below).
  • If still a tie, choose the transcript with the longest transcript sequence length.
  • If still a tie, choose the first transcript, alphabetically.

GENCODE CANONICAL (Oncotator v1.3.x.x and below)

  • Choose the transcript with highest level of curation. Note that this means lower number is better for level (see below).
  • In case of tie, choose transcript that has highest effect with this variant (see GENCODE BEST_EFFECT description below).
  • If still a tie, choose first transcript in results.

GENCODE BEST_EFFECT (Oncotator v1.3.x.x and below)

  • Choose the transcript that yields the variant classification highest on the variant classification rank list (see below).
  • If still a tie, choose first transcript in results.

See http://www.gencodegenes.org/gencodeformat.html

Levels of curation

  • 1 (verified loci)
  • 2 manually annotated loci
  • 3 automatically annotated loci

Variant classifications and score

A lower score means the effect is ranked higher.

Variant classification Score
De_novo_Start_OutOfFrame 0
Nonsense_Mutation 0
Nonstop_Mutation 0
Missense_Mutation 1
De_novo_Start_InFrame 1
In_Frame_Del 1
In_Frame_Ins 1
Frame_Shift_Del 2
Frame_Shift_Ins 2
Frame_Shift_Sub 2
Start_Codon_SNP 3
Start_Codon_Del 3
Start_Codon_Ins 3
Start_Codon_DNP 3
Start_Codon_TNP 3
Start_Codon_ONP 3
Stop_Codon_SNP 3
Stop_Codon_Del 3
Stop_Codon_Ins 3
Stop_Codon_DNP 3
Stop_Codon_TNP 3
Stop_Codon_ONP 3
Splice_Site 4
Splice_Site_SNP 4
Splice_Site_Del 4
Splice_Site_Ins 4
Splice_Site_DNP 4
Splice_Site_TNP 4
Splice_Site_ONP 4
Splice_Site 4
miRNA 4
Silent 5
3UTR 6
5UTR 6
Intron 7
5Flank 8
3Flank 8
Non-coding_Transcript 9
IGR 20
TX-REF-MISMATCH 100

APPRIS ranks

http://appris.bioinfo.cnio.es/

  • appris_principal
  • appris_candidate_highest_score
  • appris_candidate_longest_ccds
  • appris_candidate_ccds
  • appris_candidate_longest_seq
  • appris_candidate_longest
  • appris_candidate
  • no appris tag present

In cases where the *_ccds tag does not exist in the datasource itself (e.g. GENCODE v19), it is appended if "CCDS" tag is also present.

Post edited by LeeTL1220 on

Comments

  • chung2000chung2000 Member

    I realize the list of variant classifications was posted here in May 2014 before the June 2014 corpus of datasources were released. But so, we recently have variants that were annotated with a variant classification of "lincRNA". What do you have as a score for that classification?

  • LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭
    edited December 2014

    @chung2000‌ Anything not listed here gets a score of 25. With lincRNA, there is usually no transcripts to contradict and an IGR result is impossible since it overlapped a transcript.

  • corlagoncorlagon germanyMember

    Hi @Geraldine and @LeeTL1220

    I'm using the new oncotator version 1.4 together with the new datasource package and I'm a little confused about the tx-mode flag and transcript overwrite list. For example, I run oncotator with the following command:

    oncotator -v -i VCF --db-dir [somePath]/oncotator_v1_ds_Jan262014 -c [somePath]/tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt --tx-mode EFFECT [somePath]/myVCF.vcf [somePath]/oncotator.out.tcgamaf hg19 > [somePath]/oncotator.log  2>&1
    

    When I look into the log, I see the folling:

    2015-02-02 10:54:13,955 INFO [oncotator.datasources.EnsemblTranscriptDatasource:106] GENCODE v19 is being set up with default tx-mode: CANONICAL.  
    2015-02-02 10:54:13,955 WARNING [oncotator.datasources.EnsemblTranscriptDatasource:123] Attempting to set transcript mode of CANONICAL for ensembl.  This operation is only supported for GENCODE.  Otherwise, will be the same as EFFECT.
    [...]
    2015-02-02 10:54:16,952 INFO [oncotator.utils.RunSpecificationFactory:125] Setting GENCODE v19 to tx-mode of EFFECT...
    2015-02-02 10:54:16,958 INFO [oncotator.utils.RunSpecificationFactory:133] 24001 custom canonical transcripts specified.
    

    Now my questions:
    1) Where is the difference between the first tx-mode setup to CANONICAL and the second tx-mode setup to EFFECT?
    2) You're providing some slides on the "(howto) Install and run Oncotator for the first time" side explaining how clinical transcript overwrite lists are generated. On slide 5, you can read that the "Clinical" list is the "UniProt Exact” list (which contains 24000 entries) + 3 additional transcripts. Although it is a minor difference, where are the missing 2 entries in the list?

    Thanks,
    c

  • LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭

    @corlagon 1) Good catch. Under the hood, there is a minor issue: We set the datasource to CANONICAL during the first stage of initialization (thereby triggering the log messages) before setting to the user selection. You can just look at the last log message.
    2) I think we need to change the text. I think two of the three "additional transcripts" supersede existing transcripts. @Alex_Ramos Can you confirm?

  • nchambwenchambwe SeattleMember

    We are looking at the variant type classification for IGR and notice that sometimes it is associated with a gene and sometimes not -- what is the rule i.e. distance to/from a gene for assigning an intergenic variant to a gene?

Sign In or Register to comment.