Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

WDL Language Specification

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited February 2016 in Archive

Global Grammar Rules

Whitespace, Strings, Identifiers, Constants

These are common among many of the following sections

$ws = (0x20 | 0x9 | 0xD | 0xA)+
$identifier = [a-zA-Z][a-zA-Z0-9_]+
$string = "([^\\\"\n]|\\[\\"\'nrbtfav\?]|\\[0-7]{1,3}|\\x[0-9a-fA-F]+|\\[uU]([0-9a-fA-F]{4})([0-9a-fA-F]{4})?)*"
$string = '([^\\\'\n]|\\[\\"\'nrbtfav\?]|\\[0-7]{1,3}|\\x[0-9a-fA-F]+|\\[uU]([0-9a-fA-F]{4})([0-9a-fA-F]{4})?)*'
$boolean = 'true' | 'false'
$integer = [1-9][0-9]*|0[xX][0-9a-fA-F]+|0[0-7]*
$float = (([0-9]+)?\.([0-9]+)|[0-9]+\.|[0-9]+)([eE][-+]?[0-9]+)?

$string can accept the following between single or double-quotes:

  • Any character not in set: \\, " (or ' for single-quoted string), \n
  • An escape sequence starting with \\, followed by one of the following characters: \\, ", ', [nrbtfav], ?
  • An escape sequence starting with \\, followed by 1 to 3 digits of value 0 through 7 inclusive. This specifies an octal escape code.
  • An escape sequence starting with \\x, followed by hexadecimal characters 0-9a-fA-F. This specifies a hexidecimal escape code.
  • An escape sequence starting with \\u or \\U followed by either 4 or 8 hexadecimal characters 0-9a-fA-F. This specifies a unicode code point

Types

All inputs and outputs must be typed.

$type = ($primitive_type | $array_type | $map_type | $object_type) $type_postfix_quantifier?
$primitive_type = ('Boolean' | 'Int' | 'Float' | 'File' | 'String')
$array_type = 'Array' '[' ($primitive_type | $object_type | $array_type) ']'
$object_type = 'Object'
$map_type = 'Map' '[' $primitive_type ',' ($primitive_type | $array_type | $map_type | $object_type) ']'
$type_postfix_quantifier = '?' | '+'

Some examples of types:

  • File
  • Array[File]
  • Map[String, String]
  • Object

Types can also have a $type_postfix_quantifier (either ? or +):

  • ? means that the value is optional. Any expressions that fail to evaluate because this value is missing will evaluate to the empty string.
  • + can only be applied to Array types, and it signifies that the array is required to have one or more values in it

For more details on the $type_postfix_quantifier, see the section on Optional Parameters & Type Constraints

For more information on type and how they are used to construct commands and define outputs of tasks, see the Data Types & Serialization section.

Fully Qualified Names & Namespaced Identifiers

$fully_qualified_name = $identifier ('.' $identifier)*
$namespaced_identifier = $identifier ('.' $identifier)*

A fully qualified name is the unique identifier of any particular call or call input or output. For example:

other.wdl

task foobar {
  File in
  command {
    sh setup.sh ${in}
  }
  output {
    File results = stdout()
  }
}

main.wdl

import "other.wdl" as other

task test {
  String my_var
  command {
    ./script ${my_var}
  }
  output {
    File results = stdout()
  }
}

workflow wf {
  Array[String] arr = ["a", "b", "c"]
  call test
  call test as test2
  call other.foobar
  output {
    test.results
    foobar.results
  }
  scatter(x in arr) {
    call test as scattered_test {
      input: my_var=x
    }
  }
}

The following fully-qualified names would exist within workflow wf in main.wdl:

  • wf - References top-level workflow
  • wf.test - References the first call to task test
  • wf.test2 - References the second call to task test (aliased as test2)
  • wf.test.my_var - References the String input of first call to task test
  • wf.test.results - References the File output of first call to task test
  • wf.test2.my_var - References the String input of second call to task test
  • wf.test2.results - References the File output of second call to task test
  • wf.foobar.results - References the File output of the call to other.foobar
  • wf.foobar.input - References the File input of the call to other.foobar
  • wf.arr - References the Array[String] declaration on the workflow
  • wf.scattered_test - References the scattered version of call test
  • wf.scattered_test.my_var - References an Array[String] for each element used as my_var when running the scattered version of call test.
  • wf.scattered_test.results - References an Array[File] which are the accumulated results from scattering call test
  • wf.scattered_test.1.results - References an File from the second invocation (0-indexed) of call test within the scatter block. This particular invocation used value "b" for my_var

A namespaced identifier has the same syntax as a fully-qualified name. It is interpreted as the left-hand side being the name of a namespace and then the right-hand side being the name of a workflow, task, or namespace within that namespace. Consider this workflow:

import "other.wdl" as ns
workflow wf {
  call ns.ns2.task
}

Here, ns.ns2.task is a namespace identifier (see the Call Statement section for more details). Namespace identifiers, like fully-qualified names are left-associative, which means ns.ns2.task is interpreted as ((ns.ns2).task), which means ns.ns2 would have to resolve to a namespace so that .task could be applied. If ns2 was a task definition within ns, then this namespaced identifier would be invalid.

Declarations

$declaration = $type $identifier ('=' $expression)?

Declarations are declared at the top of any scope.

In a task definition, declarations are interpreted as inputs to the task that are not part of the command line itself.

If a declaration does not have an initialization, then the value is expected to be provided by the user before the workflow or task is run.

Some examples of declarations:

  • File x
  • String y = "abc"
  • Float pi = 3 + .14
  • Map[String, String] m

A declaration may also refer to elements that are outputs of tasks. For example:

task test {
  String var
  command {
    ./script ${var}
  }
  output {
    String value = read_string(stdout())
  }
}

task test2 {
  Array[String] array
  command {
    ./script ${write_lines(array)}
  }
  output {
    Int value = read_int(stdout())
  }
}

workflow wf {
  call test as x {input: var="x"}
  call test as y {input: var="y"}
  Array[String] strs = [x.value, y.value]
  call test2 as z {input: array=strs}
}

strs in this case would not be defined until both call test as x and call test as y have successfully completed. Before that's the case, strs is undefined. If any of the two tasks fail, then evaluation of strs should return an error to indicate that the call test2 as z operation should be skipped.

Expressions

$expression = '(' $expression ')'
$expression = $expression '.' $expression
$expression = $expression '[' $expression ']'
$expression = $expression '(' ($expression (',' $expression)*)? ')'
$expression = '!' $expression
$expression = '+' $expression
$expression = '-' $expression
$expression = $expression '*' $expression
$expression = $expression '%' $expression
$expression = $expression '/' $expression
$expression = $expression '+' $expression
$expression = $expression '-' $expression
$expression = $expression '<' $expression
$expression = $expression '=<' $expression
$expression = $expression '>' $expression
$expression = $expression '>=' $expression
$expression = $expression '==' $expression
$expression = $expression '!=' $expression
$expression = $expression '&&' $expression
$expression = $expression '||' $expression
$expression = '{' ($expression ':' $expression)* '}'
$expression = '[' $expression* ']'
$expression = $string | $integer | $float | $boolean | $identifier

Below are the valid results for operators on types. Any combination not in the list will result in an error.

LHS Type Operators RHS Type Result Semantics
Boolean == Boolean Boolean
Boolean != Boolean Boolean
Boolean > Boolean Boolean
Boolean >= Boolean Boolean
Boolean < Boolean Boolean
Boolean <= Boolean Boolean
Boolean || Boolean Boolean
Boolean && Boolean Boolean
File + File File Append file paths
File == File Boolean
File != File Boolean
File + String File
File == String Boolean
File != String Boolean
Float + Float Float
Float - Float Float
Float * Float Float
Float / Float Float
Float % Float Float
Float == Float Boolean
Float != Float Boolean
Float > Float Boolean
Float >= Float Boolean
Float < Float Boolean
Float <= Float Boolean
Float + Int Float
Float - Int Float
Float * Int Float
Float / Int Float
Float % Int Float
Float == Int Boolean
Float != Int Boolean
Float > Int Boolean
Float >= Int Boolean
Float < Int Boolean
Float <= Int Boolean
Float + String String
Int + Float Float
Int - Float Float
Int * Float Float
Int / Float Float
Int % Float Float
Int == Float Boolean
Int != Float Boolean
Int > Float Boolean
Int >= Float Boolean
Int < Float Boolean
Int <= Float Boolean
Int + Int Int
Int - Int Int
Int * Int Int
Int / Int Int Integer division
Int % Int Int Integer division, return remainder
Int == Int Boolean
Int != Int Boolean
Int > Int Boolean
Int >= Int Boolean
Int < Int Boolean
Int <= Int Boolean
Int + String String
String + Float String
String + Int String
String + String String
String == String Boolean
String != String Boolean
String > String Boolean
String >= String Boolean
String < String Boolean
String <= String Boolean
- Float Float
+ Float Float
- Int Int
+ Int Int
! Boolean Boolean

Operator Precedence Table

Precedence Operator type Associativity Example
12 Grouping n/a (x)
11 Member Access left-to-right x.y
10 Index left-to-right x[y]
9 Function Call left-to-right x(y,z,...)
8 Logical NOT right-to-left !x
Unary Plus right-to-left +x
Unary Negation right-to-left -x
7 Multiplication left-to-right x*y
Division left-to-right x/y
Remainder left-to-right x%y
6 Addition left-to-right x+y
Subtraction left-to-right x-y
5 Less Than left-to-right x<y
Less Than Or Equal left-to-right x<=y
Greater Than left-to-right x>y
Greater Than Or Equal left-to-right x>=y
4 Equality left-to-right x==y
Inequality left-to-right x!=y
3 Logical AND left-to-right x&&y
2 Logical OR left-to-right x||y
1 Assignment right-to-left x=y

Member Access

The syntax x.y refers to member access. x must be an object or task in a workflow. A Task can be thought of as an object where the attributes are the outputs of the task.

workflow wf {
  Object obj
  Object foo

  # This would cause a syntax error,
  # because foo is defined twice in the same namespace.
  call foo {
    input: var=obj.attr # Object attribute
  }

  call foo as foo2 {
    input: var=foo.out # Task output
  }
}

Map and Array Indexing

The syntax x[y] is for indexing maps and arrays. If x is an array, then y must evaluate to an integer. If x is a map, then y must evaluate to a key in that map.

Function Calls

Function calls, in the form of func(p1, p2, p3, ...), are either standard library functions or engine-defined functions.

In this current iteration of the spec, users cannot define their own functions.

Array Literals

Arrays values can be specified using Python-like syntax, as follows:

Array[String] a = ["a", "b", "c"]
Array[Int] b = [0,1,2]

Map Literals

Maps values can be specified using a similar Python-like sytntax:

Map[Int, Int] = {1: 10, 2: 11}
Map[String, Int] = {"a": 1, "b": 2}

Document

$document = ($import | $task | $workflow)+

$document is the root of the parse tree and it consists of one or more import statement, task, or workflow definition

Import Statements

A WDL file may contain import statements to include WDL code from other sources

$import = 'import' $ws+ $string ($ws+ 'as' $ws+ $identifier)?

The import statement specifies that $string which is to be interpted as a URI which points to a WDL file. The engine is responsible for resolving the URI and downloading the contents. The contents of the document in each URI must be WDL source code.

If a namespace identifier (via the as $identifer syntax) is specified, then all the tasks and workflows imported will only be accessible through that namespace. If no namespace identifier is specified, then all tasks and workflows from the URI are imported into the current namespace.

import "http://example.com/lib/stdlib"
import "http://example.com/lib/analysis_tasks" as analysis

workflow wf {
  File bam_file

  # file_size is from "http://example.com/lib/stdlib"
  call file_size {
    input: file=bam_file
  }
  call analysis.my_analysis_task {
    input: size=file_size.bytes, file=bam_file
  }
}

Engines should at the very least support the following protocols for import URIs:

  • http:// and https://
  • file://
  • no protocol (which should be interpreted as file://

Task Definition

A task is a declarative construct with a focus on constructing a command from a template. The command specification is interpreted in an engine specific way, though a typical case is that a command is a UNIX command line which would be run in a Docker image.

Tasks also define their outputs, which is essential for building dependencies between tasks. Any other data specified in the task definition (e.g. runtime information and meta-data) is optional.

$task = 'task' $ws+ $identifier $ws* '{' $ws* $declaration* $task_sections $ws* '}'

For example, task name { ... }. Inside the curly braces defines the sections.

Sections

The task has one or more sections:

$task_sections = ($command | $runtime | $task_output | $parameter_meta | $meta)+

Additional requirement: Exactly one $command section needs to be defined, preferably as the first section.

Command Section

$command = 'command' $ws* '{' (0xA | 0xD)* $command_part+ $ws+ '}'
$command = 'command' $ws* '<<<' (0xA | 0xD)* $command_part+ $ws+ '>>>'

A command is a task section that starts with the keyword 'command', and is enclosed in curly braces or <<< >>>. The body of the command specifies the literal command line to run with placeholders ($command_part_var) for the parts of the command line that needs to be filled in.

Command Parts

$command_part = $command_part_string | $command_part_var
$command_part_string = ^'${'+
$command_part_var = '${' $var_option* $expression '}'

The parser should read characters from the command line until it reaches a ${ character sequence. This is interpreted as a literal string ($command_part_string).

The parser should interpret any variable enclosed in ${...} as a $command_part_var.

The $expression usually references declarations at the task level. For example:

task test {
  String flags
  command {
    ps ${flags}
  }
}

In this case flags within the ${...} is an expression. The $expression can also be more complex, like a function call: write_lines(some_array_value)

NOTE: the $expression in this context can only evaluate to a primitive type (e.g. not Array, Map, or Object). The only exception to this rule is when sep is specified as one of the $var_option fields

As another example, consider how the parser would parse the following command:

grep '${start}...${end}' ${input}

This command would be parsed as:

  • grep ' - command_part_string
  • ${start} - command_part_var
  • ... - command_part_string
  • ${end} - command_part_var
  • ' - command_part_string
  • ${input} - command_part_var

Command Part Options

$var_option = $var_option_key $ws* '=' $ws* $var_option_value
$var_option_key = 'sep' | 'true' | 'false' | 'quote' | 'default'
$var_option_value = $expression

The $var_option is a set of key-value pairs for any additional and less-used options that need to be set on a parameter.

sep

'sep' is interpreted as the separator string used to join multiple parameters together. sep is only valid if the expression evaluates to an Array.

For example, if there were a declaration Array[Int] ints = [1,2,3], the command python script.py ${sep=',' numbers} would yield the command line:

python script.py 1,2,3

Alternatively, if the command were python script.py ${sep=' ' numbers} it would parse to:

python script.py 1 2 3

Additional Requirements:

>

  1. sep MUST accept only a string as its value
true and false

'true' and 'false' are only used for type Boolean and they specify what the parameter returns when the Boolean is true or false, respectively.

For example, ${true='--enable-foo', false='--disable-foo' Boolean yes_or_no} would evaluate to either --enable-foo or --disable-foo based on the value of yes_or_no.

If either value is left out, then it's equivalent to specifying the empty string. If the parameter is ${true='--enable-foo' Boolean yes_or_no}, and a value of false is specified for this parameter, then the parameter will evaluate to the empty string.

Additional Requirement:

>

  1. true and false values MUST be strings.
  2. true and false are only allowed if the type is Boolean
default

This specifies the default value if no other value is specified for this parameter.

task default_test {
  String? s
  command {
    ./my_cmd ${default="foobar" s}
  }
}

This task takes an optional String parameter and if a value is not specified, then the value of foobar will be used instead.

Additional Requirements:

>

  1. The type of the expression must match the type of the parameter
  2. If 'default' is specified, the $type_postfix_quantifier for the variable's type MUST be ?

Alternative heredoc syntax

Sometimes a command is sufficiently long enough or might use { characters that using a different set of delimiters would make it more clear. In this case, enclose the command in <<<...>>>, as follows:

task heredoc {
  File in

  command<<<
  python <<CODE
    with open("${in}") as fp:
      for line in fp:
        if not line.startswith('#'):
          print(line.strip())
  CODE
  >>>
}

Parsing of this command should be the same as the prior section describes.

Stripping Leading Whitespace

Any text inside of the command section, after instantiated, should have all common leading whitespace removed. In the task heredoc example in the previous section, if the user specifies a value of /path/to/file as the value for File in, then the command should be:

python <<CODE
  with open("/path/to/file") as fp:
    for line in fp:
      if not line.startswith('#'):
        print(line.strip())
CODE

The 2-spaces that were common to each line were removed.

If the user mixes tabs and spaces, the behavior is undefined. A warning is suggested, and perhaps a convention of 4 spaces per tab. Other implementations might return an error in this case.

Outputs Section

The outputs section defines which of the files and values should be exported after a successful run of this tool.

$task_output = 'output' $ws* '{' ($ws* $task_output_kv $ws*)* '}'
$task_output_kv = $type $identifier $ws* '=' $ws* $string

The outputs section contains typed variable definitions and a binding to the variable that they export.

The left-hand side of the equality defines the type and name of the output.

The right-hand side defines the path to the file that contains that variable definition.

For example, if a task's output section looks like this:

output {
  Int threshold = read_int("threshold.txt")
}

Then the task is expecting a file called "threshold.txt" in the current working directory where the task was executed. Inside of that file must be one line that contains only an integer and whitespace. See the Data Types & Serialization section for more details.

The filename strings may also contain variable definitions themselves (see the String Interpolation section below for more details):

output {
  Array[String] quality_scores = read_lines("${sample_id}.scores.txt")
}

If this is the case, then sample_id is considered an input to the task.

As with inputs, the outputs can reference previous outputs in the same block. The only requirement is that the output being referenced must be specified before the output which uses it.

output {
  String a = "a"
  String ab = a + "b"
}

Globs can be used to define outputs which contain many files. The glob function generates an array of File outputs:

output {
  Array[File] output_bams = glob("*.bam")
}

String Interpolation

Within tasks, any string literal can use string interpolation to access the value of any of the task's inputs. The most obvious example of this is being able to define an output file which is named as function of its input. For example:

task example {
  String prefix
  File bam
  command {
    python analysis.py --prefix=${prefix} ${bam}
  }
  output {
    File analyzed = "${prefix}.out"
    File bam_sibling = "${bam}.suffix"
  }
}

Any ${identifier} inside of a string literal must be replaced with the value of the identifier. If prefix were specified as foobar, then "${prefix}.out" would be evaluated to "foobar.out".

Runtime Section

$runtime = 'runtime' $ws* '{' ($ws* $runtime_kv $ws*)* '}'
$runtime_kv = $identifier $ws* '=' $ws* $expression

The runtime section defines key/value pairs for runtime information needed for this task. Individual backends will define which keys they will inspect so a key/value pair may or may not actually be honored depending on how the task is run.

Values can be any expression and it is up to the engine to reject keys and/or values that do not make sense in that context. For example, consider the following WDL:

task test {
  command {
    python script.py
  }
  runtime {
    docker: ["ubuntu:latest", "broadinstitute/scala-baseimage"]
  }
}

The value for the docker runtime attribute in this case is an array of values. The parser should accept this. Some engines might interpret it as an "either this image or that image" or could reject it outright.

Since values are expressions, they can also reference variables in the task:

task test {
  String ubuntu_version

  command {
    python script.py
  }
  runtime {
    docker: "ubuntu:" + ubuntu_version
  }
}

Most key/value pairs are arbitrary. However, the following keys have recommended conventions:

docker

Location of a Docker image for which this task ought to be run. This can have a format like ubuntu:latest or broadinstitute/scala-baseimage in which case it should be interpreted as an image on DockerHub (i.e. it is valid to use in a docker pull command).

task docker_test {
  String arg

  command {
    python process.py ${arg}
  }
  runtime {
    docker: "ubuntu:latest"
  }
}

memory

Memory requirements for this task. Two kinds of values are supported for this attributes:

  • Int - Intepreted as bytes
  • String - This should be a decimal value with suffixes like B, KB, MB or binary suffixes KiB, MiB. For example: 6.2 GB, 5MB, 2GiB.
task memory_test {
  String arg

  command {
    python process.py ${arg}
  }
  runtime {
    memory: "2GB"
  }
}

Parameter Metadata Section

$parameter_meta = 'parameter_meta' $ws* '{' ($ws* $parameter_meta_kv $ws*)* '}'
$parameter_meta_kv = $identifier $ws* '=' $ws* $string

This purely optional section contains key/value pairs where the keys are names of parameters and the values are string descriptions for those parameters.

Additional requirement: Any key in this section MUST correspond to a parameter in the command line

Metadata Section

$meta = 'meta' $ws* '{' ($ws* $meta_kv $ws*)* '}'
$meta_kv = $identifier $ws* '=' $ws* $string

This purely optional section contains key/value pairs for any additional meta data that should be stored with the task. For example, perhaps author or contact email.

Examples

Example 1: Simplest Task

task hello_world {
  command {echo hello world}
}

Example 2: Inputs/Outputs

task one_and_one {
  String pattern
  File infile

  command {
    grep ${pattern} ${infile}
  }
  output {
    File filtered = stdout()
  }
}

Example 3: Runtime/Metadata

task runtime_meta {
  String memory_mb
  String sample_id
  String param
  String sample_id

  command {
    java -Xmx${memory_mb}M -jar task.jar -id ${sample_id} -param ${param} -out ${sample_id}.out
  }
  output {
    File results = "${sample_id}.out"
  }
  runtime {
    docker: "broadinstitute/baseimg"
  }
  parameter_meta {
    memory_mb: "Amount of memory to allocate to the JVM"
    param: "Some arbitrary parameter"
    sample_id: "The ID of the sample in format foo_bar_baz"
  }
  meta {
    author: "Joe Somebody"
    email: "[email protected]"
  }
}

Example 4: BWA mem

task bwa_mem_tool {
  Int threads
  Int min_seed_length
  Int min_std_max_min
  File reference
  File reads

  command {
    bwa mem -t ${threads} \
            -k ${min_seed_length} \
            -I ${sep=',' min_std_max_min+} \
            ${reference} \
            ${sep=' ' reads+} > output.sam
  }
  output {
    File sam = "output.sam"
  }
  runtime {
    docker: "broadinstitute/baseimg"
  }
}

Notable pieces in this example is ${sep=',' min_std_max_min+} which specifies that min_std_max_min can be one or more integers (the + after the variable name indicates that it can be one or more). If an Array[Int] is passed into this parameter, then it's flattened by combining the elements with the separator character (sep=',').

This task also defines that it exports one file, called 'sam', which is the stdout of the execution of bwa mem.

The 'docker' portion of this task definition specifies which that this task must only be run on the Docker image specified.

Example 5: Word Count

task wc2_tool {
  File file1
  command {
    wc ${file1}
  }
  output {
    Int count = read_int(stdout())
  }
}

workflow count_lines4_wf {
  Array[File] files
  scatter(f in files) {
    call wc2_tool {
      input: file1=f
    }
  }
  output {
    wc2_tool.count
  }
}

In this example, it's all pretty boilerplate, declarative code, except for some language-y like features, like firstline(stdout) and append(list_of_count, wc2-tool.count). These both can be implemented fairly easily if we allow for custom function definitions. Parsing them is no problem. Implementation would be fairly simple and new functions would not be hard to add. Alternatively, this could be something like JavaScript or Python snippets that we run.

Example 6: tmap

This task should produce a command line like this:

tmap mapall \
stage1 map1 --min-seq-length 20 \
       map2 --min-seq-length 20 \
stage2 map1 --max-seq-length 20 --min-seq-length 10 --seed-length 16 \
       map2 --max-seed-hits -1 --max-seq-length 20 --min-seq-length 10

Task definition would look like this:

task tmap_tool {
  Array[String] stages
  File reads

  command {
    tmap mapall ${sep=' ' stages} < ${reads} > output.sam
  }
  output {
    File sam = "output.sam"
  }
}

For this particular case where the command line is itself a mini DSL, The best option at that point is to allow the user to type in the rest of the command line, which is what ${sep=' ' stages+} is for. This allows the user to specify an array of strings as the value for stages and then it concatenates them together with a space character

Variable Value
reads /path/to/fastq
stages ["stage1 map1 --min-seq-length 20 map2 --min-seq-length 20", "stage2 map1 --max-seq-length 20 --min-seq-length 10 --seed-length 16 map2 --max-seed-hits -1 --max-seq-length 20 --min-seq-length 10"]

Workflow Definition

$workflow = 'workflow' $ws* '{' $ws* $workflow_element* $ws* '}'
$workflow_element = $call | $loop | $conditional | $declaration | $scatter

A workflow is defined as the keyword workflow and the body being in curly braces.

An example of a workflow that runs one task (not defined here) would be:

workflow wf {
  Array[File] files
  Int threshold
  Map[String, String] my_map

  call analysis_job {
    input: search_paths=files, threshold=threshold, gender_lookup=my_map
  }
}

Call Statement

$call = 'call' $ws* $namespaced_identifier $ws+ ('as' $identifier)? $ws* $call_body?
$call_body = '{' $ws* $inputs? $ws* '}'
$inputs = 'input' $ws* ':' $ws* $variable_mappings
$variable_mappings = $variable_mapping_kv (',' $variable_mapping_kv)*
$variable_mapping_kv = $identifier $ws* '=' $ws* $expression

A workflow may call other tasks/workflows via the call keyword. The $namespaced_identifier is the reference to which task to run. Most commonly, it's simply the name of a task (see examples below), but it can also use . as a namespace resolver.

See the section on Fully Qualified Names & Namespaced Identifiers for details about how the $namespaced_identifier ought to be interpreted

All call statements must be uniquely identifiable. By default, the call's unique identifier is the task name (e.g. call foo would be referenced by name foo). However, if one were to call foo twice in a workflow, each subsequent call statement will need to alias itself to a unique name using the as clause: call foo as bar.

A call statement may reference a workflow too (e.g. call other_workflow). In this case, the $inputs section specifies a subset of the workflow's inputs and must specify fully qualified names.

import "lib.wdl" as lib
workflow wf {
  call my_task
  call my_task as my_task_alias
  call my_task as my_task_alias2 {
    input: threshold=2
  }
  call lib.other_task
}

The $call_body is optional and is meant to specify how to satisfy a subset of the the task or workflow's input parameters as well as a way to map tasks outputs to variables defined in the visible scopes.

A $variable_mapping in the $inputs section maps parameters in the task to expressions. These expressions usually reference outputs of other tasks, but they can be arbitrary expressions.

As an example, here is a workflow in which the second task requires an output from the first task:

task task1 {
  command {
    python do_stuff.py
  }
  output {
    File results = stdout()
  }
}
task task2 {
  File foobar
  command {
    python do_stuff2.py ${foobar}
  }
  output {
    File results = stdout()
  }
}
workflow wf {
  call task1
  call task2 {
    input: foobar=task1.results
  }
}

Scatter

$scatter = 'scatter' $ws* '(' $ws* $scatter_iteration_statment $ws*  ')' $ws* $scatter_body
$scatter_iteration_statment = $identifier $ws* 'in' $ws* $expression
$scatter_body = '{' $ws* $workflow_element* $ws* '}'

A "scatter" clause defines that everything in the body ($scatter_body) can be run in parallel. The clause in parentheses ($scatter_iteration_statement) declares which collection to scatter over and what to call each element.

The $scatter_iteration_statement has two parts: the "item" and the "collection". For example, scatter(x in y) would define x as the item, and y as the collection. The item is always an identifier, while the collection is an expression that MUST evaluate to an Array type. The item will represent each item in that expression. For example, if y evaluated to an Array[String] then x would be a String.

The $scatter_body defines a set of scopes that will execute in the context of this scatter block.

For example, if $expression is an array of integers of size 3, then the body of the scatter clause can be executed 3-times in parallel. $identifier would refer to each integer in the array.

scatter(i in integers) {
  call task1{input: num=i}
  call task2{input: num=task1.output}
}

In this example, task2 depends on task1. Variable i has an implicit index attribute to make sure we can access the right output from task1. Since both task1 and task2 run N times where N is the length of the array integers, any scalar outputs of these tasks is now an array.

Loops

TODO: This section is not complete

$loop = 'while' '(' $expression ')' '{' $workflow_element* '}'

Loops are distinct from scatter clauses because the body of a while loop needs to be executed to completion before another iteration is considered for iteration. The $expression condition is evaluated only when the iteration count is zero or if all $workflow_elements in the body have completed successfully for the current iteration.

Conditionals

$conditional = 'if' '(' $expression ')' '{' $workflow_element* '}'

Conditionals only execute the body if the expression evaluates to true

Outputs

$workflow_output = 'output' '{' ($workflow_output_fqn ($workflow_output_fqn)* '}'
$workflow_output_fqn = $fully_qualified_name '.*'?

Each workflow definition can specify an optional output section. This section lists outputs from individual calls that you also want to expose as outputs to the workflow itself. Replacing call output names with a * acts as a match-all wildcard.

If the output {...} section is omitted, then the workflow includes all outputs from all calls in its final output.

The output names in this section must be qualified with the call which created them, as in the example below.

task task1 {
  command { ./script }
  output { File results = stdout() }
}

task task2 {
  command { ./script2 }
  output {
    File results = stdout()
    String value = read_string("some_file")
  }
}

workflow wf {
  call task1
  call task2 as altname
  output {
    task1.*
    altname.value
  }
}

In this example, the fully-qualified names that would be exposed as workflow outputs would be wf.task1.results, wf.altname.value.

Tagged:
Sign In or Register to comment.