The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Queue pipeline scripts (QScripts)

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,731 admin
edited February 2014 in Pipelining with Queue

1. Introduction

Queue pipelines are Scala 2.8 files with a bit of syntactic sugar, called QScripts. Check out the following as references.

QScripts are easiest to develop using an Integrated Development Environment. See Queue with IntelliJ IDEA for our recommended settings.

The following is a basic outline of a QScript:

import org.broadinstitute.sting.queue.QScript
// List other imports here

// Define the overall QScript here.
class MyScript extends QScript {
  // List script arguments here.
  @Input(doc="My QScript inputs")
  var scriptInput: File = _

  // Create and add the functions in the script here.
  def script = {
     var myCL = new MyCommandLine
     myCL.myInput = scriptInput // Example variable input
     myCL.myOutput = new File("/path/to/output") // Example hardcoded output


2. Imports

Imports can be any scala or java imports in scala syntax.

import scala.util.Random
// etc.

3. Classes

  • To add a CommandLineFunction to a pipeline, a class must be defined that extends QScript.

  • The QScript must define a method script.

  • The QScript can define helper methods or variables.

4. Script method

The body of script should create and add Queue CommandlineFunctions.

class MyScript extends org.broadinstitute.sting.queue.QScript {
  def script = add(new CommandLineFunction { def commandLine = "echo hello world" })

5. Command Line Arguments

  • A QScript canbe set to read command line arguments by defining variables with @Input, @Output, or @Argument annotations.

  • A command line argument can be a primitive scalar, enum, File, or scala immutable Array, List, Set, or Option of a primitive, enum, or File.

  • QScript command line arguments can be marked as optional by setting required=false.

    class MyScript extends org.broadinstitute.sting.queue.QScript {
    @Input(doc="example message to echo")
    var message: String = _
    def script = add(new CommandLineFunction { def commandLine = "echo " + message })

6. Using and writing CommandLineFunctions

Adding existing GATK walkers

See Pipelining the GATK using Queue for more information on the automatically generated Queue wrappers for GATK walkers.

After functions are defined they should be added to the QScript pipeline using add().

for (vcf <- vcfs) {
  val ve = new VariantEval
  ve.vcfFile = vcf
  ve.evalFile = swapExt(vcf, "vcf", "eval")

Defining new CommandLineFunctions

  • Queue tracks dependencies between functions via variables annotated with @Input and @Output.

  • Queue will run functions based on the dependencies between them, not based on the order in which they are added in the script! So if the @Input of CommandLineFunction A depends on the @Output of ComandLineFunction B, A will wait for B to finish before it starts running.

  • See the main article Queue CommandLineFunctions for more information.

7. Examples

  • The latest version of the example files are available in the Sting git repository under public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/.

  • To print the list of arguments required by an existing QScript run with -help.

  • To check if your script has all of the CommandLineFunction variables set correctly, run without -run.
  • When you are ready to execute the full pipeline, add -run.

Hello World QScript

The following is a "hello world" example that runs a single command line to echo hello world.

import org.broadinstitute.sting.queue.QScript

class HelloWorld extends QScript {
  def script = {
    add(new CommandLineFunction {
      def commandLine = "echo hello world"

The above file is checked into the Sting git repository under HelloWorld.scala. After building Queue from source, the QScript can be run with the following command:

java -jar dist/Queue.jar -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/HelloWorld.scala -run

It should produce output similar to:

INFO  16:23:27,825 QScriptManager - Compiling 1 QScript 
INFO  16:23:31,289 QScriptManager - Compilation complete 
INFO  16:23:34,631 HelpFormatter - --------------------------------------------------------- 
INFO  16:23:34,631 HelpFormatter - Program Name: org.broadinstitute.sting.queue.QCommandLine 
INFO  16:23:34,632 HelpFormatter - Program Args: -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/HelloWorld.scala -run  
INFO  16:23:34,632 HelpFormatter - Date/Time: 2011/01/14 16:23:34 
INFO  16:23:34,632 HelpFormatter - --------------------------------------------------------- 
INFO  16:23:34,632 HelpFormatter - --------------------------------------------------------- 
INFO  16:23:34,634 QCommandLine - Scripting HelloWorld 
INFO  16:23:34,651 QCommandLine - Added 1 functions 
INFO  16:23:34,651 QGraph - Generating graph. 
INFO  16:23:34,660 QGraph - Running jobs. 
INFO  16:23:34,689 ShellJobRunner - Starting: echo hello world 
INFO  16:23:34,689 ShellJobRunner - Output written to /Users/kshakir/src/Sting/Q-43031@bmef8-d8e-1.out 
INFO  16:23:34,771 ShellJobRunner - Done: echo hello world 
INFO  16:23:34,773 QGraph - Deleting intermediate files. 
INFO  16:23:34,773 QCommandLine - Done 


This example uses automatically generated Queue compatible wrappers for the GATK. See Pipelining the GATK using Queue for more info on authoring Queue support into walkers and using walkers in Queue.

The ExampleUnifiedGenotyper.scala for running the UnifiedGenotyper followed by VariantFiltration can be found in the examples folder.

To list the command line parameters, including the required parameters, run with -help.

java -jar dist/Queue.jar -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleUnifiedGenotyper.scala -help

The help output should appear similar to this:

INFO  10:26:08,491 QScriptManager - Compiling 1 QScript
INFO  10:26:11,926 QScriptManager - Compilation complete
Program Name: org.broadinstitute.sting.queue.QCommandLine
usage: java -jar Queue.jar -S <script> [-run] [-jobRunner <job_runner>] [-bsub] [-status] [-retry <retry_failed>]
       [-startFromScratch] [-keepIntermediates] [-statusTo <status_email_to>] [-statusFrom <status_email_from>] [-dot
       <dot_graph>] [-expandedDot <expanded_dot_graph>] [-jobPrefix <job_name_prefix>] [-jobProject <job_project>] [-jobQueue
       <job_queue>] [-jobPriority <job_priority>] [-memLimit <default_memory_limit>] [-runDir <run_directory>] [-tempDir
       <temp_directory>] [-jobSGDir <job_scatter_gather_directory>] [-emailHost <emailSmtpHost>] [-emailPort <emailSmtpPort>]
       [-emailTLS] [-emailSSL] [-emailUser <emailUsername>] [-emailPassFile <emailPasswordFile>] [-emailPass <emailPassword>]
       [-l <logging_level>] [-log <log_to_file>] [-quiet] [-debug] [-h] -R <referencefile> -I <bamfile> [-L <intervals>]
       [-filter <filternames>] [-filterExpression <filterexpressions>]

 -S,--script <script>                                                      QScript scala file
 -run,--run_scripts                                                        Run QScripts.  Without this flag set only
                                                                           performs a dry run.
 -jobRunner,--job_runner <job_runner>                                      Use the specified job runner to dispatch
                                                                           command line jobs
 -bsub,--bsub                                                              Equivalent to -jobRunner Lsf706
 -status,--status                                                          Get status of jobs for the qscript
 -retry,--retry_failed <retry_failed>                                      Retry the specified number of times after a
                                                                           command fails.  Defaults to no retries.
 -startFromScratch,--start_from_scratch                                    Runs all command line functions even if the
                                                                           outputs were previously output successfully.
 -keepIntermediates,--keep_intermediate_outputs                            After a successful run keep the outputs of
                                                                           any Function marked as intermediate.
 -statusTo,--status_email_to <status_email_to>                             Email address to send emails to upon
                                                                           completion or on error.
 -statusFrom,--status_email_from <status_email_from>                       Email address to send emails from upon
                                                                           completion or on error.
 -dot,--dot_graph <dot_graph>                                              Outputs the queue graph to a .dot file.  See:
 -expandedDot,--expanded_dot_graph <expanded_dot_graph>                    Outputs the queue graph of scatter gather to
                                                                           a .dot file.  Otherwise overwrites the
 -jobPrefix,--job_name_prefix <job_name_prefix>                            Default name prefix for compute farm jobs.
 -jobProject,--job_project <job_project>                                   Default project for compute farm jobs.
 -jobQueue,--job_queue <job_queue>                                         Default queue for compute farm jobs.
 -jobPriority,--job_priority <job_priority>                                Default priority for jobs.
 -memLimit,--default_memory_limit <default_memory_limit>                   Default memory limit for jobs, in gigabytes.
 -runDir,--run_directory <run_directory>                                   Root directory to run functions from.
 -tempDir,--temp_directory <temp_directory>                                Temp directory to pass to functions.
 -jobSGDir,--job_scatter_gather_directory <job_scatter_gather_directory>   Default directory to place scatter gather
                                                                           output for compute farm jobs.
 -emailHost,--emailSmtpHost <emailSmtpHost>                                Email SMTP host. Defaults to localhost.
 -emailPort,--emailSmtpPort <emailSmtpPort>                                Email SMTP port. Defaults to 465 for ssl,
                                                                           otherwise 25.
 -emailTLS,--emailUseTLS                                                   Email should use TLS. Defaults to false.
 -emailSSL,--emailUseSSL                                                   Email should use SSL. Defaults to false.
 -emailUser,--emailUsername <emailUsername>                                Email SMTP username. Defaults to none.
 -emailPassFile,--emailPasswordFile <emailPasswordFile>                    Email SMTP password file. Defaults to none.
 -emailPass,--emailPassword <emailPassword>                                Email SMTP password. Defaults to none. Not
                                                                           secure! See emailPassFile.
 -l,--logging_level <logging_level>                                        Set the minimum level of logging, i.e.
                                                                           setting INFO get's you INFO up to FATAL,
                                                                           setting ERROR gets you ERROR and FATAL level
 -log,--log_to_file <log_to_file>                                          Set the logging location
 -quiet,--quiet_output_mode                                                Set the logging to quiet mode, no output to
 -debug,--debug_mode                                                       Set the logging file string to include a lot
                                                                           of debugging information (SLOW!)
 -h,--help                                                                 Generate this help message

Arguments for ExampleUnifiedGenotyper:
 -R,--referencefile <referencefile>                          The reference file for the bam files.
 -I,--bamfile <bamfile>                                      Bam file to genotype.
 -L,--intervals <intervals>                                  An optional file with a list of intervals to proccess.
 -filter,--filternames <filternames>                         A optional list of filter names.
 -filterExpression,--filterexpressions <filterexpressions>   An optional list of filter expressions.

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
Argument with name '--bamfile' (-I) is missing.
Argument with name '--referencefile' (-R) is missing.
        at org.broadinstitute.sting.commandline.ParsingEngine.validate(
        at org.broadinstitute.sting.commandline.ParsingEngine.validate(
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(
        at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:57)
        at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 1.0.5504):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our wiki for extensive documentation
##### ERROR Visit our forum to view answers to commonly asked questions
##### ERROR
##### ERROR MESSAGE: Argument with name '--bamfile' (-I) is missing.
##### ERROR Argument with name '--referencefile' (-R) is missing.
##### ERROR ------------------------------------------------------------------------------------------

To dry run the pipeline:

java \ \
  -jar dist/Queue.jar \
  -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleUnifiedGenotyper.scala \
  -R human_b36_both.fasta \
  -I pilot2_daughters.chr20.10k-11k.bam \
  -L chr20.interval_list \
  -filter StrandBias -filterExpression "SB>=0.10" \
  -filter AlleleBalance -filterExpression "AB>=0.75" \
  -filter QualByDepth -filterExpression "QD<5" \
  -filter HomopolymerRun -filterExpression "HRun>=4"

The dry run output should appear similar to this:

INFO  10:45:00,354 QScriptManager - Compiling 1 QScript
INFO  10:45:04,855 QScriptManager - Compilation complete
INFO  10:45:05,058 HelpFormatter - ---------------------------------------------------------
INFO  10:45:05,059 HelpFormatter - Program Name: org.broadinstitute.sting.queue.QCommandLine
INFO  10:45:05,059 HelpFormatter - Program Args: -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleUnifiedGenotyper.scala -R human_b36_both.fasta -I pilot2_daughters.chr20.10k-11k.bam -L chr20.interval_list -filter StrandBias -filterExpression SB>=0.10 -filter AlleleBalance -filterExpression AB>=0.75 -filter QualByDepth -filterExpression QD<5 -filter HomopolymerRun -filterExpression HRun>=4 
INFO  10:45:05,059 HelpFormatter - Date/Time: 2011/03/24 10:45:05
INFO  10:45:05,059 HelpFormatter - ---------------------------------------------------------
INFO  10:45:05,059 HelpFormatter - ---------------------------------------------------------
INFO  10:45:05,061 QCommandLine - Scripting ExampleUnifiedGenotyper
INFO  10:45:05,150 QCommandLine - Added 4 functions
INFO  10:45:05,150 QGraph - Generating graph.
INFO  10:45:05,169 QGraph - Generating scatter gather jobs.
INFO  10:45:05,182 QGraph - Removing original jobs.
INFO  10:45:05,183 QGraph - Adding scatter gather jobs.
INFO  10:45:05,231 QGraph - Regenerating graph.
INFO  10:45:05,247 QGraph - -------
INFO  10:45:05,252 QGraph - Pending: IntervalScatterFunction /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-1/scatter.intervals /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-2/scatter.intervals /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-3/scatter.intervals
INFO  10:45:05,253 QGraph - Log: /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/scatter/Q-60018@bmef8-d8e-1.out
INFO  10:45:05,254 QGraph - -------
INFO  10:45:05,279 QGraph - Pending: java -Xmx2g -cp "/Users/kshakir/src/Sting/dist/Queue.jar" org.broadinstitute.sting.gatk.CommandLineGATK -T UnifiedGenotyper -I /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.bam -L /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-1/scatter.intervals -R /Users/kshakir/src/Sting/human_b36_both.fasta -o /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-1/pilot2_daughters.chr20.10k-11k.unfiltered.vcf
INFO  10:45:05,279 QGraph - Log: /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-1/Q-60018@bmef8-d8e-1.out
INFO  10:45:05,279 QGraph - -------
INFO  10:45:05,283 QGraph - Pending: java -Xmx2g -cp "/Users/kshakir/src/Sting/dist/Queue.jar" org.broadinstitute.sting.gatk.CommandLineGATK -T UnifiedGenotyper -I /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.bam -L /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-2/scatter.intervals -R /Users/kshakir/src/Sting/human_b36_both.fasta -o /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-2/pilot2_daughters.chr20.10k-11k.unfiltered.vcf
INFO  10:45:05,283 QGraph - Log: /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-2/Q-60018@bmef8-d8e-1.out
INFO  10:45:05,283 QGraph - -------
INFO  10:45:05,287 QGraph - Pending: java -Xmx2g -cp "/Users/kshakir/src/Sting/dist/Queue.jar" org.broadinstitute.sting.gatk.CommandLineGATK -T UnifiedGenotyper -I /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.bam -L /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-3/scatter.intervals -R /Users/kshakir/src/Sting/human_b36_both.fasta -o /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-3/pilot2_daughters.chr20.10k-11k.unfiltered.vcf
INFO  10:45:05,287 QGraph - Log: /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-3/Q-60018@bmef8-d8e-1.out
INFO  10:45:05,288 QGraph - -------
INFO  10:45:05,288 QGraph - Pending: SimpleTextGatherFunction /Users/kshakir/src/Sting/Q-60018@bmef8-d8e-1.out
INFO  10:45:05,288 QGraph - Log: /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/gather-jobOutputFile/Q-60018@bmef8-d8e-1.out
INFO  10:45:05,289 QGraph - -------
INFO  10:45:05,291 QGraph - Pending: java -Xmx1g -cp "/Users/kshakir/src/Sting/dist/Queue.jar" org.broadinstitute.sting.gatk.CommandLineGATK -T CombineVariants -L /Users/kshakir/src/Sting/chr20.interval_list -R /Users/kshakir/src/Sting/human_b36_both.fasta -B:input0,VCF /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-1/pilot2_daughters.chr20.10k-11k.unfiltered.vcf -B:input1,VCF /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-2/pilot2_daughters.chr20.10k-11k.unfiltered.vcf -B:input2,VCF /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/temp-3/pilot2_daughters.chr20.10k-11k.unfiltered.vcf -o /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.unfiltered.vcf -priority input0,input1,input2 -assumeIdenticalSamples
INFO  10:45:05,291 QGraph - Log: /Users/kshakir/src/Sting/queueScatterGather/Q-60018@bmef8-d8e-1-sg/gather-out/Q-60018@bmef8-d8e-1.out
INFO  10:45:05,292 QGraph - -------
INFO  10:45:05,296 QGraph - Pending: java -Xmx2g -cp "/Users/kshakir/src/Sting/dist/Queue.jar" org.broadinstitute.sting.gatk.CommandLineGATK -T VariantEval -L /Users/kshakir/src/Sting/chr20.interval_list -R /Users/kshakir/src/Sting/human_b36_both.fasta -B:eval,VCF /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.unfiltered.vcf -o /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.unfiltered.eval
INFO  10:45:05,296 QGraph - Log: /Users/kshakir/src/Sting/Q-60018@bmef8-d8e-2.out
INFO  10:45:05,296 QGraph - -------
INFO  10:45:05,299 QGraph - Pending: java -Xmx2g -cp "/Users/kshakir/src/Sting/dist/Queue.jar" org.broadinstitute.sting.gatk.CommandLineGATK -T VariantFiltration -L /Users/kshakir/src/Sting/chr20.interval_list -R /Users/kshakir/src/Sting/human_b36_both.fasta -B:vcf,VCF /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.unfiltered.vcf -o /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.filtered.vcf -filter SB>=0.10 -filter AB>=0.75 -filter QD<5 -filter HRun>=4 -filterName StrandBias -filterName AlleleBalance -filterName QualByDepth -filterName HomopolymerRun
INFO  10:45:05,299 QGraph - Log: /Users/kshakir/src/Sting/Q-60018@bmef8-d8e-3.out
INFO  10:45:05,302 QGraph - -------
INFO  10:45:05,303 QGraph - Pending: java -Xmx2g -cp "/Users/kshakir/src/Sting/dist/Queue.jar" org.broadinstitute.sting.gatk.CommandLineGATK -T VariantEval -L /Users/kshakir/src/Sting/chr20.interval_list -R /Users/kshakir/src/Sting/human_b36_both.fasta -B:eval,VCF /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.filtered.vcf -o /Users/kshakir/src/Sting/pilot2_daughters.chr20.10k-11k.filtered.eval
INFO  10:45:05,303 QGraph - Log: /Users/kshakir/src/Sting/Q-60018@bmef8-d8e-4.out
INFO  10:45:05,304 QGraph - Dry run completed successfully!
INFO  10:45:05,304 QGraph - Re-run with "-run" to execute the functions.
INFO  10:45:05,304 QCommandLine - Done

8. Using traits to pass common values between QScripts to CommandLineFunctions

QScript files often create multiple CommandLineFunctions with similar arguments. Use various scala tricks such as inner classes, traits / mixins, etc. to reuse variables.

  • A self type can be useful to distinguish between this. We use qscript as an alias for the QScript's this to distinguish from the this inside of inner classes or traits.

  • A trait mixin can be used to reuse functionality. The trait below is designed to copy values from the QScript and then is mixed into different instances of the functions.

See the following example:

class MyScript extends org.broadinstitute.sting.queue.QScript {
  // Create an alias 'qscript' for 'MyScript.this'
  qscript =>

  // This is a script argument
  @Argument(doc="message to display")
  var message: String = _

  // This is a script argument
  @Argument(doc="number of times to display")
  var count: Int = _

  trait ReusableArguments extends MyCommandLineFunction {
    // Whenever a function is created 'with' this trait, it will copy the message.
    this.commandLineMessage = qscript.message

  abstract class MyCommandLineFunction extends CommandLineFunction {
     // This is a per command line argument
     @Argument(doc="message to display")
     var commandLineMessage: String = _

  class MyEchoFunction extends MyCommandLineFunction {
     def commandLine = "echo " + commandLineMessage

  class MyAlsoEchoFunction extends MyCommandLineFunction {
     def commandLine = "echo also " + commandLineMessage

  def script = {
    for (i <- 1 to count) {
      val echo = new MyEchoFunction with ReusableArguments
      val alsoEcho = new MyAlsoEchoFunction with ReusableArguments
      add(echo, alsoEcho)

Geraldine Van der Auwera, PhD

Post edited by Geraldine_VdAuwera on


Sign In or Register to comment.