The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

Can we implement GATK/Queue on google hadoop?

danielyindanielyin Member Posts: 7

Hello, I`m new to GATK and Queue. I understand that we can write a QScript in Queue to generate separate GATK jobs and run them on a cluster of several nodes. Can we implement GATK or Queue on google hadoop?


  • danielyindanielyin Member Posts: 7

    It seems that implementing GATK on hadoop requires tons of work

  • CarneiroCarneiro Administrator, Dev Posts: 274 admin
    edited June 2013

    yes and no. The GATK wasn't implemented with hadoop in mind, this is only for historical reasons.

    One could envision a full reimplementation of the engine to handle a HDFS and making -nt / -nct work transparently in a hadoop framework. This is not "a lot of work" but it's work that requires deep knowledge of the intrinsics of the GATK. Right now we don't have the resources to implement this ourselves, or provide the level o support it would be necessary to have someone else do it.

    On the other hand, like Queue, one could implement a wrapper around the GATK to instantiate it in a hadoop cluster. This is not a lot of work at all, in fact, there are people already thinking about this problem outside our group. Unfortunately our resources are very limited but this alternative should require much less understanding of the GATK engine and is probably feasible for a good software engineer to tackle.

  • pagarwal14pagarwal14 Durham, NCMember Posts: 11

    Hello, some of us at Duke University, along with the person who posted original question, are thinking of working on writing a wrapper around the GATK to be able to use it on a Hadoop cluster. Before we started, we wanted to get some feedback on the utility and feasibility of creating such a wrapper. Could you please provide any feedback/thoughts on this, such as the potential performance advantage, challenges in writing software, amount of background work we would have to do to understand the GATK code base etc. Thanks for your input!

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,130 admin

    Hi @pagarwal14,

    To be honest this is not something we have given a lot of thought to, and right now we can't spare the resources to look at it with the seriousness needed to fully answer your questions. One important caveat is that our developer-oriented documentation is rather sparse at the moment, so that may be the biggest stumbling block; we aim to deal with that issue progressively over the next few months, but in the meantime we will not be able to offer you much support toward grokking the GATK codebase.

    That being said, I hope this does not deter you from undertaking this project, as there seems to be some demand for this and there should not be any unreasonable technical difficulty involved. Good luck!

    Geraldine Van der Auwera, PhD

  • pagarwal14pagarwal14 Durham, NCMember Posts: 11

    Thank you for your response. Can you point us to the code base and the location for the developer-oriented documentation as it exists today. I searched around on the website for the documentation and the closest I could find was at Is that all of the developer documentation or is there a more consolidated document. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,130 admin

    Hi there,

    You can get the source code of the full GATK on (which has a restrictive license) or the framework only on (which is MIT-licensed).

    I'm afraid the "Developer Zone" is indeed all we have for dev docs right now, aside from the code javadocs of course.

    Geraldine Van der Auwera, PhD

  • fanliangzefanliangze ChinaMember Posts: 2

    May I ask what is the progress? thanks a lot!

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,130 admin

    We are now looking at technologies other than Hadoop.

    Geraldine Van der Auwera, PhD

  • leichangleichang Member Posts: 1

    what is the new technologies you guys are looking at? I am curious about the progress. currently, we have a project that wants to use hadoop and gatk.

Sign In or Register to comment.