The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
Last chance to register for the GATK workshop next week in Basel, Switzerland!

Overview of Queue

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,468Administrator, Dev admin
edited October 2014 in Pipelining with Queue

1. Introduction

GATK-Queue is command-line scripting framework for defining multi-stage genomic analysis pipelines combined with an execution manager that runs those pipelines from end-to-end. Often processing genome data includes several steps to produces outputs, for example our BAM to VCF calling pipeline include among other things:

  • Local realignment around indels

  • Emitting raw SNP calls

  • Emitting indels
  • Masking the SNPs at indels
  • Annotating SNPs using chip data
  • Labeling suspicious calls based on filters
  • Creating a summary report with statistics

Running these tools one by one in series may often take weeks for processing, or would require custom scripting to try and optimize using parallel resources.

With a Queue script users can semantically define the multiple steps of the pipeline and then hand off the logistics of running the pipeline to completion. Queue runs independent jobs in parallel, handles transient errors, and uses various techniques such as running multiple copies of the same program on different portions of the genome to produce outputs faster.

2. Obtaining Queue

You have two options: download the binary distribution (prepackaged, ready to run program) or build it from source.

- Download the binary

This is obviously the easiest way to go. Links are on the Downloads page. Just get the Queue package; no need to get the GATK package separately as GATK is bundled in with Queue.

- Building Queue from source

Briefly, here's what you need to know/do:

Queue is part of the GATK repository. Download the source from the public repository on Github. Run the following command:

git clone

IMPORTANT NOTE: These instructions refer to the MIT-licensed version of the GATK+Queue source code. With that version, you will be able to build Queue itself, as well as the public portion of the GATK (the core framework), but that will not include the GATK analysis tools. If you want to use Queue to pipeline the GATK analysis tools, you need to clone the 'protected' repository. Please note however that part of the source code in that repository (the 'protected' module) is under a different license which excludes for-profit use, modification and redistribution.

Move to the git root directory and use maven to build the source.

mvn clean verify

All dependencies will be managed by Maven as needed.

See this article on how to test your installation of Queue.

3. Running Queue

See this article on running Queue for the first time for full details.

Queue arguments can be listed by running with --help

java -jar dist/Queue.jar --help

To list the arguments required by a QScript, add the script with -S and run with --help.

java -jar dist/Queue.jar -S script.scala --help

Note that by default queue runs in a "dry" mode, as explained in the link above. After verifying the generated commands execute the pipeline by adding -run.

See QFunction and Command Line Options for more info on adjusting Queue options.

4. QScripts

General Information

Queue pipelines are written as Scala 2.8 files with a bit of syntactic sugar, called QScripts.

Every QScript includes the following steps:

  • New instances of CommandLineFunctions are created

  • Input and output arguments are specified on each function

  • The function is added with add() to Queue for dispatch and monitoring

The basic command-line to run the Queue pipelines on the command line is

java -jar Queue.jar -S <script>.scala

See the main article Queue QScripts for more info on QScripts.

Supported QScripts

Most QScripts are analysis pipelines that are custom-built for specific projects, and we currently do not offer any QScripts as supported analysis tools. However, we do provide some example scripts that you can use as basis to write your own QScripts (see below).

Example QScripts

The latest version of the example files are available in the Sting github repository under public/scala/qscript/examples

5. Visualization and Queue


Queue automatically generates GATKReport-formatted runtime information about executed jobs. See this presentation for a general introduction to QJobReport.

Note that Queue attempts to generate a standard visualization using an R script in the GATK public/R repository. You must provide a path to this location if you want the script to run automatically. Additionally the script requires the gsalib to be installed on the machine, which is typically done by providing its path in your .Rprofile file:

bm8da-dbe ~/Desktop/broadLocal/GATK/unstable % cat ~/.Rprofile

Note that gsalib is available from the CRAN repository so you can install it with the canonical R package install command.


  • The system only provides information about commands that have just run. Resuming from a partially completed job will only show the information for the jobs that just ran, and not for any of the completed commands. This is due to a structural limitation in Queue, and will be fixed when the Queue infrastructure improves

  • This feature only works for command line and LSF execution models. SGE should be easy to add for a motivated individual but we cannot test this capabilities here at the Broad. Please send us a patch if you do extend Queue to support SGE.

DOT visualization of Pipelines

Queue emits a file to help visualize your commands. You can open this file in programs like DOT, OmniGraffle, etc to view your pipelines. By default the system will print out your LSF command lines, but this can be too much in a complex pipeline.

To clarify your pipeline, override the dotString() function:

class CountCovariates(bamIn: File, recalDataIn: File, args: String = "") extends GatkFunction {
    @Input(doc="foo") var bam = bamIn
    @Input(doc="foo") var bamIndex = bai(bamIn)
    @Output(doc="foo") var recalData = recalDataIn
    memoryLimit = Some(4)
    override def dotString = "CountCovariates: %s [args %s]".format(bamIn.getName, args)
    def commandLine = gatkCommandLine("CountCovariates") + args + " -l INFO -D /humgen/gsa-hpprojects/GATK/data/dbsnp_129_hg18.rod -I %s --max_reads_at_locus 20000 -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile %s".format(bam, recalData)

Here we only see CountCovariates my.bam [-OQ], for example, in the dot file. The base quality score recalibration pipeline, as visualized by DOT, can be viewed here:

6. Further reading

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD


  • jgarbejgarbe Posts: 3Member

    Almost all of the links on this page just link back to this page, I can't seem to find any additional information about Queue.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,468Administrator, Dev admin

    We are planning to overhaul the Queue documentation. In the meantime, have a look at the contents of the Guide section called Developer Zone. Most of the Queue articles live there. You can also do a tag search for articles tagged "queue".

    Geraldine Van der Auwera, PhD

  • raymond301raymond301 Posts: 1Member
    edited February 2013

    It would be really helpful to know who else (Institutions) is using this GATK Queue. Is there a list?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,468Administrator, Dev admin

    No, we don't keep a list of who uses Queue, sorry.

    Geraldine Van der Auwera, PhD

  • CarlosBorrotoCarlosBorroto Posts: 46Member

    I ran into this issue when building queue. Currently using github code from revision:

    $ java -version
    java version "1.7.0_45"
    Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
    Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
    $ ant -version
    Apache Ant(TM) version 1.9.3 compiled on December 23 201

    [javadoc] Generating Javadoc
    [javadoc] Javadoc execution
    /Users/cborroto/src/Sting/build.xml:612: java.lang.NullPointerException
    at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(

  • CarlosBorrotoCarlosBorroto Posts: 46Member

    Ok, I found this is actually a bug in ant 1.9.3:

    Upstream already fixed the issue and I guess it should be part of next release. I can confirm 'ant queue' works with ant 1.9.2.

    This statement should probably include versions numbers, thou:

    "Just make sure you have suitable versions of the JDK and Ant!"


  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,468Administrator, Dev admin

    Thanks for reporting your solution, Carlos. This ant bug has popped up elsewhere as well. We'll revise the docs. FYI we are switching to maven for the next release; more on this soon.

    Geraldine Van der Auwera, PhD

  • andrewoandrewo Posts: 10Member

    The link for the presentation about the QJobReport gives this error "Access to this link has been disabled. Please ask the owner of the shared link to send a new link to access the file or the folder."

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,468Administrator, Dev admin

    @andrewo, I posted a new link, it should work now.

    Geraldine Van der Auwera, PhD

  • andrewoandrewo Posts: 10Member

    Thanks for the updated link.

    I noticed in my job reports that all of the exechosts are labeled "unknown" except for the head node. Is there any way to get the name of the actual nodes in the report? I'm running my HC jobs with Queue on our SGE cluster. The node name should be stored in the SGE environment variable $HOSTNAME.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,468Administrator, Dev admin

    Queue may not know how to retrieve the node names from SGE (we use and develop for LSF). We can't spend time to add in the capability, but we're happy to look at a patch.

    Geraldine Van der Auwera, PhD

  • JeremyLeipzigJeremyLeipzig Posts: 6Member

    @raymond301 said:
    It would be really helpful to know who else (Institutions) is using this GATK Queue. Is there a list?

    Pretty much the best/only Queue implementation is Piper. It is some really dense code, quite a piece of work:

Sign In or Register to comment.