Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Can we implement GATK/Queue on google hadoop?

Hello, I`m new to GATK and Queue. I understand that we can write a QScript in Queue to generate separate GATK jobs and run them on a cluster of several nodes. Can we implement GATK or Queue on google hadoop?


  • danielyindanielyin Member

    It seems that implementing GATK on hadoop requires tons of work

  • CarneiroCarneiro Charlestown, MAMember admin
    edited June 2013

    yes and no. The GATK wasn't implemented with hadoop in mind, this is only for historical reasons.

    One could envision a full reimplementation of the engine to handle a HDFS and making -nt / -nct work transparently in a hadoop framework. This is not "a lot of work" but it's work that requires deep knowledge of the intrinsics of the GATK. Right now we don't have the resources to implement this ourselves, or provide the level o support it would be necessary to have someone else do it.

    On the other hand, like Queue, one could implement a wrapper around the GATK to instantiate it in a hadoop cluster. This is not a lot of work at all, in fact, there are people already thinking about this problem outside our group. Unfortunately our resources are very limited but this alternative should require much less understanding of the GATK engine and is probably feasible for a good software engineer to tackle.

  • pagarwal14pagarwal14 Durham, NCMember

    Hello, some of us at Duke University, along with the person who posted original question, are thinking of working on writing a wrapper around the GATK to be able to use it on a Hadoop cluster. Before we started, we wanted to get some feedback on the utility and feasibility of creating such a wrapper. Could you please provide any feedback/thoughts on this, such as the potential performance advantage, challenges in writing software, amount of background work we would have to do to understand the GATK code base etc. Thanks for your input!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @pagarwal14,

    To be honest this is not something we have given a lot of thought to, and right now we can't spare the resources to look at it with the seriousness needed to fully answer your questions. One important caveat is that our developer-oriented documentation is rather sparse at the moment, so that may be the biggest stumbling block; we aim to deal with that issue progressively over the next few months, but in the meantime we will not be able to offer you much support toward grokking the GATK codebase.

    That being said, I hope this does not deter you from undertaking this project, as there seems to be some demand for this and there should not be any unreasonable technical difficulty involved. Good luck!

  • pagarwal14pagarwal14 Durham, NCMember

    Thank you for your response. Can you point us to the code base and the location for the developer-oriented documentation as it exists today. I searched around on the website for the documentation and the closest I could find was at Is that all of the developer documentation or is there a more consolidated document. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    You can get the source code of the full GATK on (which has a restrictive license) or the framework only on (which is MIT-licensed).

    I'm afraid the "Developer Zone" is indeed all we have for dev docs right now, aside from the code javadocs of course.

  • fanliangzefanliangze ChinaMember

    May I ask what is the progress? thanks a lot!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We are now looking at technologies other than Hadoop.

  • leichangleichang Member

    what is the new technologies you guys are looking at? I am curious about the progress. currently, we have a project that wants to use hadoop and gatk.

Sign In or Register to comment.