Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Queue CommandLineFunctions

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,073Administrator, GATK Developer admin
edited February 3 in Queue

1. Basic QScript run rules

  • In the script method, a QScript will add one or more CommandLineFunctions.
  • Queue tracks dependencies between functions via variables annotated with @Input and @Output.
  • Queue will run functions based on the dependencies between them, so if the @Input of CommandLineFunction A depends on the @Output of ComandLineFunction B, A will wait for B to finish before it starts running.

2. Command Line

Each CommandLineFunction must define the actual command line to run as follows.

class MyCommandLine extends CommandLineFunction {
  def commandLine = "myScript.sh hello world"
}

Constructing a Command Line Manually

If you're writing a one-off CommandLineFunction that is not destined for use by other QScripts, it's often easiest to construct the command line directly rather than through the API methods provided in the CommandLineFunction class.

For example:

def commandLine = "cat %s | grep -v \"#\" > %s".format(files, out)

Constructing a Command Line using API Methods

If you're writing a CommandLineFunction that will become part of Queue and/or will be used by other QScripts, however, our best practice recommendation is to construct your command line only using the methods provided in the CommandLineFunction class: required(), optional(), conditional(), and repeat()

The reason for this is that these methods automatically escape the values you give them so that they'll be interpreted literally within the shell scripts Queue generates to run your command, and they also manage whitespace separation of command-line tokens for you. This prevents (for example) a value like MQ > 10 from being interpreted as an output redirection by the shell, and avoids issues with values containing embedded spaces. The methods also give you the ability to turn escaping and/or whitespace separation off as needed. An example:

override def commandLine = super.commandLine +
                           required("eff") +
                           conditional(verbose, "-v") +
                           optional("-c", config) +
                           required("-i", "vcf") +
                           required("-o", "vcf") +
                           required(genomeVersion) +
                           required(inVcf) +
                           required(">", escape=false) +  // This will be shell-interpreted as an output redirection
                           required(outVcf)

The CommandLineFunctions built into Queue, including the CommandLineFunctions automatically generated for GATK Walkers, are all written using this pattern. This means that when you configure a GATK Walker or one of the other built-in CommandLineFunctions in a QScript, you can rely on all of your values being safely escaped and taken literally when the commands are run, including values containing characters that would normally be interpreted by the shell such as MQ > 10.

Below is a brief overview of the API methods available to you in the CommandLineFunction class for safely constructing command lines:

  • required()

Used for command-line arguments that are always present, e.g.:

required("-f", "filename")                              returns: " '-f' 'filename' "
required("-f", "filename", escape=false)                returns: " -f filename "
required("java")                                        returns: " 'java' "
required("INPUT=", "myBam.bam", spaceSeparated=false)   returns: " 'INPUT=myBam.bam' "
  • optional()

Used for command-line arguments that may or may not be present, e.g.:

optional("-f", myVar) behaves like required() if myVar has a value, but returns ""
if myVar is null/Nil/None
  • conditional()

Used for command-line arguments that should only be included if some condition is true, e.g.:

conditional(verbose, "-v") returns " '-v' " if verbose is true, otherwise returns ""
  • repeat()

Used for command-line arguments that are repeated multiple times on the command line, e.g.:

repeat("-f", List("file1", "file2", "file3")) returns: " '-f' 'file1' '-f' 'file2' '-f' 'file3' "

3. Arguments

  • CommandLineFunction arguments use a similar syntax to arguments.

  • CommandLineFunction variables are annotated with @Input, @Output, or @Argument annotations.

Input and Output Files

So that Queue can track the input and output files of a command, CommandLineFunction @Input and @Output must be java.io.File objects.

class MyCommandLine extends CommandLineFunction {
  @Input(doc="input file")
  var inputFile: File = _
  def commandLine = "myScript.sh -fileParam " + inputFile
}

FileProvider

CommandLineFunction variables can also provide indirect access to java.io.File inputs and outputs via the FileProvider trait.

class MyCommandLine extends CommandLineFunction {
  @Input(doc="named input file")
  var inputFile: ExampleFileProvider = _
  def commandLine = "myScript.sh " + inputFile
}

// An example FileProvider that stores a 'name' with a 'file'.
class ExampleFileProvider(var name: String, var file: File) extends org.broadinstitute.sting.queue.function.FileProvider {
  override def toString = " -fileName " + name + " -fileParam " + file
}

Optional Arguments

Optional files can be specified via required=false, and can use the CommandLineFunction.optional() utility method, as described above:

class MyCommandLine extends CommandLineFunction {
  @Input(doc="input file", required=false)
  var inputFile: File = _
  // -fileParam will only be added if the QScript sets inputFile on this instance of MyCommandLine
  def commandLine = required("myScript.sh") + optional("-fileParam", inputFile)
}

Collections as Arguments

A List or Set of files can use the CommandLineFunction.repeat() utility method, as described above:

class MyCommandLine extends CommandLineFunction {
  @Input(doc="input file")
  var inputFile: List[File] = Nil // NOTE: Do not set List or Set variables to null!
  // -fileParam will added as many times as the QScript adds the inputFile on this instance of MyCommandLine
  def commandLine = required("myScript.sh") + repeat("-fileParam", inputFile)
}

Non-File Arguments

A command line function can define other required arguments via @Argument.

class MyCommandLine extends CommandLineFunction {
  @Argument(doc="message to display")
  var veryImportantMessage: String = _
  // If the QScript does not specify the required veryImportantMessage, the pipeline will not run.
  def commandLine = required("myScript.sh") + required(veryImportantMessage)
}

4. Example: "samtools index"

class SamToolsIndex extends CommandLineFunction {
  @Input(doc="bam to index") var bamFile: File = _
  @Output(doc="bam index") var baiFile: File = _
  def commandLine = "samtools index %s %s".format(bamFile, baiFile)
)

Or, using the CommandLineFunction API methods to construct the command line with automatic shell escaping:

class SamToolsIndex extends CommandLineFunction {
  @Input(doc="bam to index") var bamFile: File = _
  @Output(doc="bam index") var baiFile: File = _
  def commandLine = required("samtools") + required("index") + required(bamFile) + required(baiFile)
)
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

Sign In or Register to comment.