The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Queue CommandLineFunctions

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited February 2014 in Pipelining with Queue

1. Basic QScript run rules

  • In the script method, a QScript will add one or more CommandLineFunctions.
  • Queue tracks dependencies between functions via variables annotated with @Input and @Output.
  • Queue will run functions based on the dependencies between them, so if the @Input of CommandLineFunction A depends on the @Output of ComandLineFunction B, A will wait for B to finish before it starts running.

2. Command Line

Each CommandLineFunction must define the actual command line to run as follows.

class MyCommandLine extends CommandLineFunction {
  def commandLine = " hello world"

Constructing a Command Line Manually

If you're writing a one-off CommandLineFunction that is not destined for use
by other QScripts, it's often easiest to construct the command line directly
rather than through the API methods provided in the CommandLineFunction class.

For example:

def commandLine = "cat %s | grep -v \"#\" > %s".format(files, out)

Constructing a Command Line using API Methods

If you're writing a CommandLineFunction that will become part of Queue and/or
will be used by other QScripts, however, our best practice recommendation is
to construct your command line only using the methods provided in the
CommandLineFunction class: required(), optional(), conditional(), and repeat()

The reason for this is that these methods automatically escape the values you
give them so that they'll be interpreted literally within the shell scripts
Queue generates to run your command, and they also manage whitespace separation of command-line tokens for you. This prevents (for example) a value like MQ > 10 from being interpreted as an output redirection by the shell, and avoids issues with values containing embedded spaces. The methods also give you the ability to turn escaping and/or whitespace separation off as needed. An example:

override def commandLine = super.commandLine +
                           required("eff") +
                           conditional(verbose, "-v") +
                           optional("-c", config) +
                           required("-i", "vcf") +
                           required("-o", "vcf") +
                           required(genomeVersion) +
                           required(inVcf) +
                           required(">", escape=false) +  // This will be shell-interpreted as an output redirection

The CommandLineFunctions built into Queue, including the CommandLineFunctions
automatically generated for GATK Walkers, are all written using this pattern.
This means that when you configure a GATK Walker or one of the other built-in
CommandLineFunctions in a QScript, you can rely on all of your values being
safely escaped and taken literally when the commands are run, including values
containing characters that would normally be interpreted by the shell such as
MQ > 10.

Below is a brief overview of the API methods available to you in the CommandLineFunction class for safely constructing command lines:

  • required()

Used for command-line arguments that are always present, e.g.:

required("-f", "filename")                              returns: " '-f' 'filename' "
required("-f", "filename", escape=false)                returns: " -f filename "
required("java")                                        returns: " 'java' "
required("INPUT=", "myBam.bam", spaceSeparated=false)   returns: " 'INPUT=myBam.bam' "
  • optional()

Used for command-line arguments that may or may not be present, e.g.:

optional("-f", myVar) behaves like required() if myVar has a value, but returns ""
if myVar is null/Nil/None
  • conditional()

Used for command-line arguments that should only be included if some condition is true, e.g.:

conditional(verbose, "-v") returns " '-v' " if verbose is true, otherwise returns ""
  • repeat()

Used for command-line arguments that are repeated multiple times on the command line, e.g.:

repeat("-f", List("file1", "file2", "file3")) returns: " '-f' 'file1' '-f' 'file2' '-f' 'file3' "

3. Arguments

  • CommandLineFunction arguments use a similar syntax to arguments.

  • CommandLineFunction variables are annotated with @Input, @Output, or @Argument annotations.

Input and Output Files

So that Queue can track the input and output files of a command, CommandLineFunction @Input and @Output must be objects.

class MyCommandLine extends CommandLineFunction {
  @Input(doc="input file")
  var inputFile: File = _
  def commandLine = " -fileParam " + inputFile


CommandLineFunction variables can also provide indirect access to inputs and outputs via the FileProvider trait.

class MyCommandLine extends CommandLineFunction {
  @Input(doc="named input file")
  var inputFile: ExampleFileProvider = _
  def commandLine = " " + inputFile

// An example FileProvider that stores a 'name' with a 'file'.
class ExampleFileProvider(var name: String, var file: File) extends org.broadinstitute.sting.queue.function.FileProvider {
  override def toString = " -fileName " + name + " -fileParam " + file

Optional Arguments

Optional files can be specified via required=false, and can use the CommandLineFunction.optional() utility method, as described above:

class MyCommandLine extends CommandLineFunction {
  @Input(doc="input file", required=false)
  var inputFile: File = _
  // -fileParam will only be added if the QScript sets inputFile on this instance of MyCommandLine
  def commandLine = required("") + optional("-fileParam", inputFile)

Collections as Arguments

A List or Set of files can use the CommandLineFunction.repeat() utility method, as described above:

class MyCommandLine extends CommandLineFunction {
  @Input(doc="input file")
  var inputFile: List[File] = Nil // NOTE: Do not set List or Set variables to null!
  // -fileParam will added as many times as the QScript adds the inputFile on this instance of MyCommandLine
  def commandLine = required("") + repeat("-fileParam", inputFile)

Non-File Arguments

A command line function can define other required arguments via @Argument.

class MyCommandLine extends CommandLineFunction {
  @Argument(doc="message to display")
  var veryImportantMessage: String = _
  // If the QScript does not specify the required veryImportantMessage, the pipeline will not run.
  def commandLine = required("") + required(veryImportantMessage)

4. Example: "samtools index"

class SamToolsIndex extends CommandLineFunction {
  @Input(doc="bam to index") var bamFile: File = _
  @Output(doc="bam index") var baiFile: File = _
  def commandLine = "samtools index %s %s".format(bamFile, baiFile)

Or, using the CommandLineFunction API methods to construct the command line with automatic shell escaping:

class SamToolsIndex extends CommandLineFunction {
  @Input(doc="bam to index") var bamFile: File = _
  @Output(doc="bam index") var baiFile: File = _
  def commandLine = required("samtools") + required("index") + required(bamFile) + required(baiFile)
Post edited by Geraldine_VdAuwera on


  • If I construct a command that utilizes multiple threads, how can I tell CommandLineFunction how many threads to use? Parts of my pipeline can be multi-threaded depending on the job and I want to communicate the thread/core number to my job scheduler.

  • Nevermind, found it in CommandLineFunction.scala

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Thanks for sharing what you found.

  • jellisjellis AustraliaMember

    Hi, this page refers to a FileProvider trait (org.broadinstitute.sting.queue.function.FileProvider); however, I can't find this trait anywhere. Am I missing something obvious, or is this something that has been removed in more recent versions of the GATK (I notice that it still refers to sting). If it has been removed, is it something that I could easily reimplement? Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @jellis It was probably moved in the big sting renaming operation, but I'm not sure where. @kshakir may be able to illuminate you.

  • BiocybermanBiocyberman Member
    edited March 2015

    @Geraldine_VdAuwera Is there a way to programmatically get names and values of all annotated fields (@Input, @Output @Argument)?
    That is, I want to loop through say list of all @Argument, get their field names and values. I have a custom qscript that extends QScript. My qscipt calls to several other Utility classes, I do not want to pass to many parameters to those classes, but only my qscript instance instead. Then I will look inside qscript instance and get parameters for my Utility classes.

    I have tried with qscript.getClass.getDeclaredFields and could fetch the fields names, but not their values.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @Biocyberman It is feasible but it's not pretty, at least the way I know how to do it, which is how we populate the Tool Docs with default values. To generate those at build time, we basically produce a temporary instance of each object in order to query its field values. In your case I guess since you would be doing this at runtime you'll have the actual instances already so you may not have to do that. I can't tell you what the methods are off the top of my head, but if you look in the help utils code you should find some examples. It's in the feature handling code iirc.

  • Hello,

    You can use the sample commandline script included in the PHPlist distribution archive. This sample script is called "phplist" and is located in the "bin" directory of the archive. You will have to edit this script to fit your system.

    The normal PHPlist access restrictions are bypassed, but the Unix users who are allowed to run the script should be mentioned in the "commandline_users" variable in the config.php file. The identity of a user is determined by the USER environment variable, which may not always match your system.

Sign In or Register to comment.