VariantAnnotator and multiple records in resources

Hi,

I'm using VariantAnnotator to add annotations to variants from a bunch of sources. One issue that I have is that for some variants, there are multiple annotations in a supplied resource. In the docs, I read

"Note that if there are multiple records in the resource file that overlap the given position, one is chosen randomly."

Can this behaviour be altered? I need to output all annotations for a record, either on a single line, or on multiple.

In the case i'm working on, one line has the annotation "CLNSIG=5" (i.e. a known pathogenic variant) and the other (likely older record) is "CLNSIG=1" i.e. a variant of unknown significance. I need to output both so I can filter downstream (using SelectVariants) to select those where "CLNSIG=5".

cheers

Answers

  • dklevebringdklevebring Member
    edited February 2015

    After a night of sleep I can further expand on the issue. In the resource VCF (from ClinVar), I have this:

    #13      32890627        rs80359400      A       AT      .       .       ASP;CLNACC=RCV000113041.1;CLNALLE=1;CLNDBN=Bre
    #13      32890627        rs80359393      A       AT      .       .       ASP;CLNACC=RCV000044248.2|RCV000082917.3;CLNAL
    #13      32890627        rs80359399      AT      A       .       .       ASP;CLNACC=RCV000044247.2|RCV000113038.1;CLNAL
    

    and dbSNP:

    13  32890627    rs80359399  AT  A   .   .   ASP;GENEINFO=BRCA2:675;LSD;NSF;OTHERKG;PM;REF;RS=80359399;RSPOS=32890633;SAO=0;SLO;SSR=0;VC=DIV;VP=0x050160001205000002100200;WGT=1;dbSNPBuildID=132
    13  32890627    rs80359393  A   AT  .   .   ASP;GENEINFO=BRCA2:675;LSD;NSF;OM;OTHERKG;PM;REF;RS=80359393;RSPOS=32890633;SAO=0;SLO;SSR=0;VC=DIV;VP=0x050160001205000002110200;WGT=1;dbSNPBuildID=132
    

    And my variants to be annotated are these:

    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  L12982N_panel_v1
    13  32890627    .   A   AT  1500    .   AB=0.45 GT:AO:DP:PL:QA:QR:RO    0/1:86:170:100,0,100:2492:2390:78
    13  32890627    .   AT  A   1500    .   AB=0.45 GT:AO:DP:PL:QA:QR:RO    0/1:86:170:100,0,100:2492:2390:78
    

    I run VariantAnnotator, like so:

    java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R $REF -V $V --resource:clinvar $CLINVAR --expression clinvar.CLNSIG -L $V -E clinvar.CLNACC

    In the results, it's clear that VariantAnnotator did two (kind of weird) things:

    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  L12982N_panel_v1
    13  32890627    .   A   AT  1500    .   AB=0.45;clinvar.CLNACC=RCV000113041.1;clinvar.CLNSIG=5  GT:AO:DP:PL:QA:QR:RO    0/1:86:170:100,0,100:2492:2390:78
    13  32890627    .   AT  A   1500    .   AB=0.45;clinvar.CLNACC=RCV000113041.1;clinvar.CLNSIG=5  GT:AO:DP:PL:QA:QR:RO    0/1:86:170:100,0,100:2492:2390:78
    
    1. VA ignores one of the annotation lines for the insertion A->AT variant (this is according to docs, but still questionable behaviour)
    2. The deletion variant (AT->A) is annotated with the data from the insertion variant in the resource file. See the CLNACC annotation, which for the AT->A should be RCV000044247.2|RCV000113038.1.

    Interestingly though, when I turn on dbSNP annotation with --dbsnp $DBSNP:

    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  L12982N_panel_v1
    13  32890627    rs80359393  A   AT  1500    .   AB=0.45;DB;clinvar.CLNACC=RCV000113041.1;clinvar.CLNSIG=5   GT:AO:DP:PL:QA:QR:RO0/1:86:170:100,0,100:2492:2390:78
    13  32890627    rs80359399  AT  A   1500    .   AB=0.45;DB;clinvar.CLNACC=RCV000113041.1;clinvar.CLNSIG=5   GT:AO:DP:PL:QA:QR:RO0/1:86:170:100,0,100:2492:2390:78
    

    This adds the dbSNP rsids, and does so correctly for both the insertion and deletion. This behaviour is different from that of the resource annotation (point 2 above).

    I assume that 2) is a bug, and 1) is the correct behaviour. I do think however, that 1) is quesionable. Merging records in the resource would be one way around this, but running CombineVariants merges the deletion and insertion variants (all three rows) to a single row, therefore losing the connection between alt allele and rsid. Keeping the two on separate rows would handle this.

    cheers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I think your interpretations are largely correct.

    I agree 1) is awkward; I'm not sure that behavior can be improved directly (there are many potential complications) but what would you think of this as a workaround: if CombineVariants could be told to merge variant records if and only if the REF and ALT alleles are identical, otherwise keep them separate? With the ability to choose which variant-level annotations it would keep, if they are in conflict (which I think should be feasible with the existing merge priority machinery).

    Regarding 2) I believe this is the same behavior as 1) but with even less justification. The problem being that VA doesn't check the alleles, just the position, iirc. It may be possible to put in such a check; would you be able to submit a bug report with file snippets that we could use in a feature enhancement request?

Sign In or Register to comment.