To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Is it possible to make VariantAnnotator check REF and ALT fields?

Hi! So we use GATK a lot in our research, it works amazingly well most of the time, so first of all, thanks for creating it!

We have this one problem that we were unable to solve on our own. Say we have a VCF file that contains called variants, and we want to annotate it using an external database, clinvar as one example. We used to use VariantAnnotator for this purpose until we found out (both by reading documentation and doing a quick experimental check) that it annotates variants based solely on position, ignoring the actual mutation that happened. Imagine for example that variant A → C was called at a specific position, but data for A → G is recorded at clinvar. In this case VariantAnnotator will still carry over INFO fields from clinvar into our VCF. We ideally do not want this to happen, because strictly speaking clinvar data was recorded for a completely different mutation and might not be relevant at all in our case.

My question: is there an option for VariantAnnotator to make it check REF and ALT fields in the process of annotation? (Although I fear it wouldn't be possible because it uses RodWalker class to traverse the variants.) Or, alternatively, can this be achieved using combination of other GATK commands? Or will we have to write a custom walker to accomplish what we want? (The latter is obviously the worst case, but hopefully we can manage that.)

All the best,
Kirill

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Kirill,

    I think we put in the capability to tie incoming annotations to a specific allele in version 3.4 -- but I would have to check the usage, which I can't do at the moment. Have a look at the VA argument docs, it may be in there.

  • edited July 2015

    Hi Geraldine,

    Thank you very much for your reply. I've carefully looked through all of the VA arguments, but unfortunately none of them seem to be tying annotations to a specific allele. Besides, list of VA-specific arguments is precisely identical to that of GATK 3.1-1 (which we used to use before transitioning to 3.4-0).

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Oh I just realized I had been thinking of something a bit different.

    I would think the RodWalker can be made to check the alleles -- some of the tools do this.

  • kmhernankmhernan Chicago, ILMember
    edited September 2016

    @Geraldine_VdAuwera I am having a bit of an issue with the implementation of the -rac flag. Looking at the source and the annotations it seems like it is literally looping over all reference and alt alleles in the VCFs and if any match it annotates. Unfortunately that doesn't do what I would think the function should do. For example, if my input vcf has a TA/T deletion but the resource vcf has a T/TA insertion I don't believe these should match. They are still annotated as matching. Maybe I'm not understanding the inplementation.

    Issue · Github
    by Sheila

    Issue Number
    1276
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, I think you're right -- sounds like the implementer didn't think about this case figure. Would you be able to generate a small test snippet to use for debugging?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @kmhernan
    Hi,

    Instructions are here.

    Thanks,
    Sheila

  • kmhernankmhernan Chicago, ILMember

    @Sheila Sorry for the delay, I have been out of town. These are protected data that I was trying annotate with the COSMIC ids. I'll see if I can dig up some unprotected data to upload...

Sign In or Register to comment.