If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

Get RODBinding records that overlap endpoint of deletion, etc in GATK 1.6

I have implemented a custom walker that is attempting to intersect variants from a VCF with regions specified in a ROD. In the map function, I iterate across all variant contexts (modeled on various walkers in GATK), incorporating information from features obtained from the other RODs via RefMetaDataTracker getValues method.

I am not surprised that features that only overlap the endpoint of the current (length > 1) variant are not returned. However, I am not sure how to implement the behavior I want within the walker framework. If I am iterating across variants in a VCF, how do I get a set of features in another ROD that overlap with that variant, including those that just overlap with the end of the variant?


  • Mark_DePristoMark_DePristo admin Broad InstituteMember admin

    Have a look at

    By default getValues() will return all overlapping reference metadata. Perhaps you are explicitly using the function that only returns values starting at the current location? It is more complex if you want to get all features that overlap any value (for example, you have an indel of length 10 and want values in the future). If you want that you need to program it in your walker to do the integration.

  • mlindermmlinderm Member

    Thanks Mark! I think it is that latter situation I am asking about...

    As an example I have the following test variants

    chr1    19  .   ATG A
    chr1    25  .   A   T
    chr1    29  .   A   ATG

    that I am intersecting against a test ROD with the following region

    chr1    20  30

    and by intersection I mean I am iterating across the above variants and then using tracker.getValues(regionRODBinding) to get the overlapping region above. The latter two variants work as expected - I get the region feature from the tracker. For the first variant though, I don't get the region, because I presume it is only the endpoint of the deletion that intersects the region and that endpoint is in the "future". Is there an example in GATK somewhere for how to get these "future" records?

    The only approach I have been able to tease out (although I have not yet tried) is to go through getToolkit().getRodDataSources() to get the underlying RODDataSource and then get an iterator from there. Will that work, or will I confuse the iterators somehow? What do you recommend for doing this kind of intersection?

  • Mark_DePristoMark_DePristo admin Broad InstituteMember admin

    Really the only way to do this is to keep track of the objects in the walker, and look at the active objects as new map calls arrive and incorporate future information when it arrives. Basically you are just keeping a local history in the walker.

  • mlindermmlinderm Member

    I take it then I would need to buffer up the variants before writing them out? I assume I can't add attributes to variants already sent to the VCFWriter...

    More generally, what triggers the map? Is there one map call for every starting position in every bound ROD? Could I iterate across the regions instead to accomplish what I need here?

Sign In or Register to comment.