The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Can we implement GATK/Queue on google hadoop?

Hello, I`m new to GATK and Queue. I understand that we can write a QScript in Queue to generate separate GATK jobs and run them on a cluster of several nodes. Can we implement GATK or Queue on google hadoop?

Answers

  • It seems that implementing GATK on hadoop requires tons of work

  • CarneiroCarneiro Charlestown, MAMember
    edited June 2013

    yes and no. The GATK wasn't implemented with hadoop in mind, this is only for historical reasons.

    One could envision a full reimplementation of the engine to handle a HDFS and making -nt / -nct work transparently in a hadoop framework. This is not "a lot of work" but it's work that requires deep knowledge of the intrinsics of the GATK. Right now we don't have the resources to implement this ourselves, or provide the level o support it would be necessary to have someone else do it.

    On the other hand, like Queue, one could implement a wrapper around the GATK to instantiate it in a hadoop cluster. This is not a lot of work at all, in fact, there are people already thinking about this problem outside our group. Unfortunately our resources are very limited but this alternative should require much less understanding of the GATK engine and is probably feasible for a good software engineer to tackle.

  • pagarwal14pagarwal14 Durham, NCMember

    Hello, some of us at Duke University, along with the person who posted original question, are thinking of working on writing a wrapper around the GATK to be able to use it on a Hadoop cluster. Before we started, we wanted to get some feedback on the utility and feasibility of creating such a wrapper. Could you please provide any feedback/thoughts on this, such as the potential performance advantage, challenges in writing software, amount of background work we would have to do to understand the GATK code base etc. Thanks for your input!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @pagarwal14,

    To be honest this is not something we have given a lot of thought to, and right now we can't spare the resources to look at it with the seriousness needed to fully answer your questions. One important caveat is that our developer-oriented documentation is rather sparse at the moment, so that may be the biggest stumbling block; we aim to deal with that issue progressively over the next few months, but in the meantime we will not be able to offer you much support toward grokking the GATK codebase.

    That being said, I hope this does not deter you from undertaking this project, as there seems to be some demand for this and there should not be any unreasonable technical difficulty involved. Good luck!

  • pagarwal14pagarwal14 Durham, NCMember

    Thank you for your response. Can you point us to the code base and the location for the developer-oriented documentation as it exists today. I searched around on the website for the documentation and the closest I could find was at http://www.broadinstitute.org/gatk/guide/topic?name=developer-zone. Is that all of the developer documentation or is there a more consolidated document. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    You can get the source code of the full GATK on https://github.com/broadgsa/gatk-protected (which has a restrictive license) or the framework only on https://github.com/broadgsa/gatk (which is MIT-licensed).

    I'm afraid the "Developer Zone" is indeed all we have for dev docs right now, aside from the code javadocs of course.

  • May I ask what is the progress? thanks a lot!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    We are now looking at technologies other than Hadoop.

  • what is the new technologies you guys are looking at? I am curious about the progress. currently, we have a project that wants to use hadoop and gatk.

Sign In or Register to comment.