We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
WDL Language Specification

Global Grammar Rules
Whitespace, Strings, Identifiers, Constants
These are common among many of the following sections
$ws = (0x20 | 0x9 | 0xD | 0xA)+ $identifier = [a-zA-Z][a-zA-Z0-9_]+ $string = "([^\\\"\n]|\\[\\"\'nrbtfav\?]|\\[0-7]{1,3}|\\x[0-9a-fA-F]+|\\[uU]([0-9a-fA-F]{4})([0-9a-fA-F]{4})?)*" $string = '([^\\\'\n]|\\[\\"\'nrbtfav\?]|\\[0-7]{1,3}|\\x[0-9a-fA-F]+|\\[uU]([0-9a-fA-F]{4})([0-9a-fA-F]{4})?)*' $boolean = 'true' | 'false' $integer = [1-9][0-9]*|0[xX][0-9a-fA-F]+|0[0-7]* $float = (([0-9]+)?\.([0-9]+)|[0-9]+\.|[0-9]+)([eE][-+]?[0-9]+)?
$string
can accept the following between single or double-quotes:
- Any character not in set:
\\
,"
(or'
for single-quoted string),\n
- An escape sequence starting with
\\
, followed by one of the following characters:\\
,"
,'
,[nrbtfav]
,?
- An escape sequence starting with
\\
, followed by 1 to 3 digits of value 0 through 7 inclusive. This specifies an octal escape code. - An escape sequence starting with
\\x
, followed by hexadecimal characters0-9a-fA-F
. This specifies a hexidecimal escape code. - An escape sequence starting with
\\u
or\\U
followed by either 4 or 8 hexadecimal characters0-9a-fA-F
. This specifies a unicode code point
Types
All inputs and outputs must be typed.
$type = ($primitive_type | $array_type | $map_type | $object_type) $type_postfix_quantifier? $primitive_type = ('Boolean' | 'Int' | 'Float' | 'File' | 'String') $array_type = 'Array' '[' ($primitive_type | $object_type | $array_type) ']' $object_type = 'Object' $map_type = 'Map' '[' $primitive_type ',' ($primitive_type | $array_type | $map_type | $object_type) ']' $type_postfix_quantifier = '?' | '+'
Some examples of types:
File
Array[File]
Map[String, String]
Object
Types can also have a $type_postfix_quantifier
(either ?
or +
):
?
means that the value is optional. Any expressions that fail to evaluate because this value is missing will evaluate to the empty string.+
can only be applied toArray
types, and it signifies that the array is required to have one or more values in it
For more details on the $type_postfix_quantifier
, see the section on Optional Parameters & Type Constraints
For more information on type and how they are used to construct commands and define outputs of tasks, see the Data Types & Serialization section.
Fully Qualified Names & Namespaced Identifiers
$fully_qualified_name = $identifier ('.' $identifier)* $namespaced_identifier = $identifier ('.' $identifier)*
A fully qualified name is the unique identifier of any particular call
or call input or output. For example:
other.wdl
task foobar { File in command { sh setup.sh ${in} } output { File results = stdout() } }
main.wdl
import "other.wdl" as other task test { String my_var command { ./script ${my_var} } output { File results = stdout() } } workflow wf { Array[String] arr = ["a", "b", "c"] call test call test as test2 call other.foobar output { test.results foobar.results } scatter(x in arr) { call test as scattered_test { input: my_var=x } } }
The following fully-qualified names would exist within workflow wf
in main.wdl:
wf
- References top-level workflowwf.test
- References the first call to tasktest
wf.test2
- References the second call to tasktest
(aliased as test2)wf.test.my_var
- References theString
input of first call to tasktest
wf.test.results
- References theFile
output of first call to tasktest
wf.test2.my_var
- References theString
input of second call to tasktest
wf.test2.results
- References theFile
output of second call to tasktest
wf.foobar.results
- References theFile
output of the call toother.foobar
wf.foobar.input
- References theFile
input of the call toother.foobar
wf.arr
- References theArray[String]
declaration on the workflowwf.scattered_test
- References the scattered version ofcall test
wf.scattered_test.my_var
- References anArray[String]
for each element used asmy_var
when running the scattered version ofcall test
.wf.scattered_test.results
- References anArray[File]
which are the accumulated results from scatteringcall test
wf.scattered_test.1.results
- References anFile
from the second invocation (0-indexed) ofcall test
within the scatter block. This particular invocation used value "b" formy_var
A namespaced identifier has the same syntax as a fully-qualified name. It is interpreted as the left-hand side being the name of a namespace and then the right-hand side being the name of a workflow, task, or namespace within that namespace. Consider this workflow:
import "other.wdl" as ns workflow wf { call ns.ns2.task }
Here, ns.ns2.task
is a namespace identifier (see the Call Statement section for more details). Namespace identifiers, like fully-qualified names are left-associative, which means ns.ns2.task
is interpreted as ((ns.ns2).task)
, which means ns.ns2
would have to resolve to a namespace so that .task
could be applied. If ns2
was a task definition within ns
, then this namespaced identifier would be invalid.
Declarations
$declaration = $type $identifier ('=' $expression)?
Declarations are declared at the top of any scope.
In a task definition, declarations are interpreted as inputs to the task that are not part of the command line itself.
If a declaration does not have an initialization, then the value is expected to be provided by the user before the workflow or task is run.
Some examples of declarations:
File x
String y = "abc"
Float pi = 3 + .14
Map[String, String] m
A declaration may also refer to elements that are outputs of tasks. For example:
task test { String var command { ./script ${var} } output { String value = read_string(stdout()) } } task test2 { Array[String] array command { ./script ${write_lines(array)} } output { Int value = read_int(stdout()) } } workflow wf { call test as x {input: var="x"} call test as y {input: var="y"} Array[String] strs = [x.value, y.value] call test2 as z {input: array=strs} }
strs
in this case would not be defined until both call test as x
and call test as y
have successfully completed. Before that's the case, strs
is undefined. If any of the two tasks fail, then evaluation of strs
should return an error to indicate that the call test2 as z
operation should be skipped.
Expressions
$expression = '(' $expression ')' $expression = $expression '.' $expression $expression = $expression '[' $expression ']' $expression = $expression '(' ($expression (',' $expression)*)? ')' $expression = '!' $expression $expression = '+' $expression $expression = '-' $expression $expression = $expression '*' $expression $expression = $expression '%' $expression $expression = $expression '/' $expression $expression = $expression '+' $expression $expression = $expression '-' $expression $expression = $expression '<' $expression $expression = $expression '=<' $expression $expression = $expression '>' $expression $expression = $expression '>=' $expression $expression = $expression '==' $expression $expression = $expression '!=' $expression $expression = $expression '&&' $expression $expression = $expression '||' $expression $expression = '{' ($expression ':' $expression)* '}' $expression = '[' $expression* ']' $expression = $string | $integer | $float | $boolean | $identifier
Below are the valid results for operators on types. Any combination not in the list will result in an error.
LHS Type | Operators | RHS Type | Result | Semantics |
---|---|---|---|---|
Boolean |
== |
Boolean |
Boolean |
|
Boolean |
!= |
Boolean |
Boolean |
|
Boolean |
> |
Boolean |
Boolean |
|
Boolean |
>= |
Boolean |
Boolean |
|
Boolean |
< |
Boolean |
Boolean |
|
Boolean |
<= |
Boolean |
Boolean |
|
Boolean |
|| |
Boolean |
Boolean |
|
Boolean |
&& |
Boolean |
Boolean |
|
File |
+ |
File |
File |
Append file paths |
File |
== |
File |
Boolean |
|
File |
!= |
File |
Boolean |
|
File |
+ |
String |
File |
|
File |
== |
String |
Boolean |
|
File |
!= |
String |
Boolean |
|
Float |
+ |
Float |
Float |
|
Float |
- |
Float |
Float |
|
Float |
* |
Float |
Float |
|
Float |
/ |
Float |
Float |
|
Float |
% |
Float |
Float |
|
Float |
== |
Float |
Boolean |
|
Float |
!= |
Float |
Boolean |
|
Float |
> |
Float |
Boolean |
|
Float |
>= |
Float |
Boolean |
|
Float |
< |
Float |
Boolean |
|
Float |
<= |
Float |
Boolean |
|
Float |
+ |
Int |
Float |
|
Float |
- |
Int |
Float |
|
Float |
* |
Int |
Float |
|
Float |
/ |
Int |
Float |
|
Float |
% |
Int |
Float |
|
Float |
== |
Int |
Boolean |
|
Float |
!= |
Int |
Boolean |
|
Float |
> |
Int |
Boolean |
|
Float |
>= |
Int |
Boolean |
|
Float |
< |
Int |
Boolean |
|
Float |
<= |
Int |
Boolean |
|
Float |
+ |
String |
String |
|
Int |
+ |
Float |
Float |
|
Int |
- |
Float |
Float |
|
Int |
* |
Float |
Float |
|
Int |
/ |
Float |
Float |
|
Int |
% |
Float |
Float |
|
Int |
== |
Float |
Boolean |
|
Int |
!= |
Float |
Boolean |
|
Int |
> |
Float |
Boolean |
|
Int |
>= |
Float |
Boolean |
|
Int |
< |
Float |
Boolean |
|
Int |
<= |
Float |
Boolean |
|
Int |
+ |
Int |
Int |
|
Int |
- |
Int |
Int |
|
Int |
* |
Int |
Int |
|
Int |
/ |
Int |
Int |
Integer division |
Int |
% |
Int |
Int |
Integer division, return remainder |
Int |
== |
Int |
Boolean |
|
Int |
!= |
Int |
Boolean |
|
Int |
> |
Int |
Boolean |
|
Int |
>= |
Int |
Boolean |
|
Int |
< |
Int |
Boolean |
|
Int |
<= |
Int |
Boolean |
|
Int |
+ |
String |
String |
|
String |
+ |
Float |
String |
|
String |
+ |
Int |
String |
|
String |
+ |
String |
String |
|
String |
== |
String |
Boolean |
|
String |
!= |
String |
Boolean |
|
String |
> |
String |
Boolean |
|
String |
>= |
String |
Boolean |
|
String |
< |
String |
Boolean |
|
String |
<= |
String |
Boolean |
|
- |
Float |
Float |
||
+ |
Float |
Float |
||
- |
Int |
Int |
||
+ |
Int |
Int |
||
! |
Boolean |
Boolean |
Operator Precedence Table
Precedence | Operator type | Associativity | Example |
---|---|---|---|
12 | Grouping | n/a | (x) |
11 | Member Access | left-to-right | x.y |
10 | Index | left-to-right | x[y] |
9 | Function Call | left-to-right | x(y,z,...) |
8 | Logical NOT | right-to-left | !x |
Unary Plus | right-to-left | +x | |
Unary Negation | right-to-left | -x | |
7 | Multiplication | left-to-right | x*y |
Division | left-to-right | x/y | |
Remainder | left-to-right | x%y | |
6 | Addition | left-to-right | x+y |
Subtraction | left-to-right | x-y | |
5 | Less Than | left-to-right | x<y |
Less Than Or Equal | left-to-right | x<=y | |
Greater Than | left-to-right | x>y | |
Greater Than Or Equal | left-to-right | x>=y | |
4 | Equality | left-to-right | x==y |
Inequality | left-to-right | x!=y | |
3 | Logical AND | left-to-right | x&&y |
2 | Logical OR | left-to-right | x||y |
1 | Assignment | right-to-left | x=y |
Member Access
The syntax x.y
refers to member access. x
must be an object or task in a workflow. A Task can be thought of as an object where the attributes are the outputs of the task.
workflow wf { Object obj Object foo # This would cause a syntax error, # because foo is defined twice in the same namespace. call foo { input: var=obj.attr # Object attribute } call foo as foo2 { input: var=foo.out # Task output } }
Map and Array Indexing
The syntax x[y]
is for indexing maps and arrays. If x
is an array, then y
must evaluate to an integer. If x
is a map, then y
must evaluate to a key in that map.
Function Calls
Function calls, in the form of func(p1, p2, p3, ...)
, are either standard library functions or engine-defined functions.
In this current iteration of the spec, users cannot define their own functions.
Array Literals
Arrays values can be specified using Python-like syntax, as follows:
Array[String] a = ["a", "b", "c"] Array[Int] b = [0,1,2]
Map Literals
Maps values can be specified using a similar Python-like sytntax:
Map[Int, Int] = {1: 10, 2: 11} Map[String, Int] = {"a": 1, "b": 2}
Document
$document = ($import | $task | $workflow)+
$document
is the root of the parse tree and it consists of one or more import statement, task, or workflow definition
Import Statements
A WDL file may contain import statements to include WDL code from other sources
$import = 'import' $ws+ $string ($ws+ 'as' $ws+ $identifier)?
The import statement specifies that $string
which is to be interpted as a URI which points to a WDL file. The engine is responsible for resolving the URI and downloading the contents. The contents of the document in each URI must be WDL source code.
If a namespace identifier (via the as $identifer
syntax) is specified, then all the tasks and workflows imported will only be accessible through that namespace. If no namespace identifier is specified, then all tasks and workflows from the URI are imported into the current namespace.
import "http://example.com/lib/stdlib" import "http://example.com/lib/analysis_tasks" as analysis workflow wf { File bam_file # file_size is from "http://example.com/lib/stdlib" call file_size { input: file=bam_file } call analysis.my_analysis_task { input: size=file_size.bytes, file=bam_file } }
Engines should at the very least support the following protocols for import URIs:
http://
andhttps://
file://
- no protocol (which should be interpreted as
file://
Task Definition
A task is a declarative construct with a focus on constructing a command from a template. The command specification is interpreted in an engine specific way, though a typical case is that a command is a UNIX command line which would be run in a Docker image.
Tasks also define their outputs, which is essential for building dependencies between tasks. Any other data specified in the task definition (e.g. runtime information and meta-data) is optional.
$task = 'task' $ws+ $identifier $ws* '{' $ws* $declaration* $task_sections $ws* '}'
For example, task name { ... }
. Inside the curly braces defines the sections.
Sections
The task has one or more sections:
$task_sections = ($command | $runtime | $task_output | $parameter_meta | $meta)+
Additional requirement: Exactly one
$command
section needs to be defined, preferably as the first section.
Command Section
$command = 'command' $ws* '{' (0xA | 0xD)* $command_part+ $ws+ '}' $command = 'command' $ws* '<<<' (0xA | 0xD)* $command_part+ $ws+ '>>>'
A command is a task section that starts with the keyword 'command', and is enclosed in curly braces or <<<
>>>
. The body of the command specifies the literal command line to run with placeholders ($command_part_var
) for the parts of the command line that needs to be filled in.
Command Parts
$command_part = $command_part_string | $command_part_var $command_part_string = ^'${'+ $command_part_var = '${' $var_option* $expression '}'
The parser should read characters from the command line until it reaches a ${
character sequence. This is interpreted as a literal string ($command_part_string
).
The parser should interpret any variable enclosed in ${
...}
as a $command_part_var
.
The $expression
usually references declarations at the task level. For example:
task test { String flags command { ps ${flags} } }
In this case flags
within the ${
...}
is an expression. The $expression
can also be more complex, like a function call: write_lines(some_array_value)
NOTE: the
$expression
in this context can only evaluate to a primitive type (e.g. notArray
,Map
, orObject
). The only exception to this rule is whensep
is specified as one of the$var_option
fields
As another example, consider how the parser would parse the following command:
grep '${start}...${end}' ${input}
This command would be parsed as:
grep '
- command_part_string${start}
- command_part_var...
- command_part_string${end}
- command_part_var'
- command_part_string${input}
- command_part_var
Command Part Options
$var_option = $var_option_key $ws* '=' $ws* $var_option_value $var_option_key = 'sep' | 'true' | 'false' | 'quote' | 'default' $var_option_value = $expression
The $var_option
is a set of key-value pairs for any additional and less-used options that need to be set on a parameter.
sep
'sep' is interpreted as the separator string used to join multiple parameters together. sep
is only valid if the expression evaluates to an Array
.
For example, if there were a declaration Array[Int] ints = [1,2,3]
, the command python script.py ${sep=',' numbers}
would yield the command line:
python script.py 1,2,3
Alternatively, if the command were python script.py ${sep=' ' numbers}
it would parse to:
python script.py 1 2 3
Additional Requirements:
>
- sep MUST accept only a string as its value
true and false
'true' and 'false' are only used for type Boolean and they specify what the parameter returns when the Boolean is true or false, respectively.
For example, ${true='--enable-foo', false='--disable-foo' Boolean yes_or_no}
would evaluate to either --enable-foo
or --disable-foo
based on the value of yes_or_no.
If either value is left out, then it's equivalent to specifying the empty string. If the parameter is ${true='--enable-foo' Boolean yes_or_no}
, and a value of false is specified for this parameter, then the parameter will evaluate to the empty string.
Additional Requirement:
>
true
andfalse
values MUST be strings.true
andfalse
are only allowed if the type isBoolean
default
This specifies the default value if no other value is specified for this parameter.
task default_test { String? s command { ./my_cmd ${default="foobar" s} } }
This task takes an optional String
parameter and if a value is not specified, then the value of foobar
will be used instead.
Additional Requirements:
>
- The type of the expression must match the type of the parameter
- If 'default' is specified, the
$type_postfix_quantifier
for the variable's type MUST be?
Alternative heredoc syntax
Sometimes a command is sufficiently long enough or might use {
characters that using a different set of delimiters would make it more clear. In this case, enclose the command in <<<
...>>>
, as follows:
task heredoc { File in command<<< python <<CODE with open("${in}") as fp: for line in fp: if not line.startswith('#'): print(line.strip()) CODE >>> }
Parsing of this command should be the same as the prior section describes.
Stripping Leading Whitespace
Any text inside of the command
section, after instantiated, should have all common leading whitespace removed. In the task heredoc
example in the previous section, if the user specifies a value of /path/to/file
as the value for File in
, then the command should be:
python <<CODE with open("/path/to/file") as fp: for line in fp: if not line.startswith('#'): print(line.strip()) CODE
The 2-spaces that were common to each line were removed.
If the user mixes tabs and spaces, the behavior is undefined. A warning is suggested, and perhaps a convention of 4 spaces per tab. Other implementations might return an error in this case.
Outputs Section
The outputs section defines which of the files and values should be exported after a successful run of this tool.
$task_output = 'output' $ws* '{' ($ws* $task_output_kv $ws*)* '}' $task_output_kv = $type $identifier $ws* '=' $ws* $string
The outputs section contains typed variable definitions and a binding to the variable that they export.
The left-hand side of the equality defines the type and name of the output.
The right-hand side defines the path to the file that contains that variable definition.
For example, if a task's output section looks like this:
output { Int threshold = read_int("threshold.txt") }
Then the task is expecting a file called "threshold.txt" in the current working directory where the task was executed. Inside of that file must be one line that contains only an integer and whitespace. See the Data Types & Serialization section for more details.
The filename strings may also contain variable definitions themselves (see the String Interpolation section below for more details):
output { Array[String] quality_scores = read_lines("${sample_id}.scores.txt") }
If this is the case, then sample_id
is considered an input to the task.
As with inputs, the outputs can reference previous outputs in the same block. The only requirement is that the output being referenced must be specified before the output which uses it.
output { String a = "a" String ab = a + "b" }
Globs can be used to define outputs which contain many files. The glob function generates an array of File outputs:
output { Array[File] output_bams = glob("*.bam") }
String Interpolation
Within tasks, any string literal can use string interpolation to access the value of any of the task's inputs. The most obvious example of this is being able to define an output file which is named as function of its input. For example:
task example { String prefix File bam command { python analysis.py --prefix=${prefix} ${bam} } output { File analyzed = "${prefix}.out" File bam_sibling = "${bam}.suffix" } }
Any ${identifier}
inside of a string literal must be replaced with the value of the identifier. If prefix were specified as foobar
, then "${prefix}.out"
would be evaluated to "foobar.out"
.
Runtime Section
$runtime = 'runtime' $ws* '{' ($ws* $runtime_kv $ws*)* '}' $runtime_kv = $identifier $ws* '=' $ws* $expression
The runtime section defines key/value pairs for runtime information needed for this task. Individual backends will define which keys they will inspect so a key/value pair may or may not actually be honored depending on how the task is run.
Values can be any expression and it is up to the engine to reject keys and/or values that do not make sense in that context. For example, consider the following WDL:
task test { command { python script.py } runtime { docker: ["ubuntu:latest", "broadinstitute/scala-baseimage"] } }
The value for the docker
runtime attribute in this case is an array of values. The parser should accept this. Some engines might interpret it as an "either this image or that image" or could reject it outright.
Since values are expressions, they can also reference variables in the task:
task test { String ubuntu_version command { python script.py } runtime { docker: "ubuntu:" + ubuntu_version } }
Most key/value pairs are arbitrary. However, the following keys have recommended conventions:
docker
Location of a Docker image for which this task ought to be run. This can have a format like ubuntu:latest
or broadinstitute/scala-baseimage
in which case it should be interpreted as an image on DockerHub (i.e. it is valid to use in a docker pull
command).
task docker_test { String arg command { python process.py ${arg} } runtime { docker: "ubuntu:latest" } }
memory
Memory requirements for this task. Two kinds of values are supported for this attributes:
Int
- Intepreted as bytesString
- This should be a decimal value with suffixes likeB
,KB
,MB
or binary suffixesKiB
,MiB
. For example:6.2 GB
,5MB
,2GiB
.
task memory_test { String arg command { python process.py ${arg} } runtime { memory: "2GB" } }
Parameter Metadata Section
$parameter_meta = 'parameter_meta' $ws* '{' ($ws* $parameter_meta_kv $ws*)* '}' $parameter_meta_kv = $identifier $ws* '=' $ws* $string
This purely optional section contains key/value pairs where the keys are names of parameters and the values are string descriptions for those parameters.
Additional requirement: Any key in this section MUST correspond to a parameter in the command line
Metadata Section
$meta = 'meta' $ws* '{' ($ws* $meta_kv $ws*)* '}' $meta_kv = $identifier $ws* '=' $ws* $string
This purely optional section contains key/value pairs for any additional meta data that should be stored with the task. For example, perhaps author or contact email.
Examples
Example 1: Simplest Task
task hello_world { command {echo hello world} }
Example 2: Inputs/Outputs
task one_and_one { String pattern File infile command { grep ${pattern} ${infile} } output { File filtered = stdout() } }
Example 3: Runtime/Metadata
task runtime_meta { String memory_mb String sample_id String param String sample_id command { java -Xmx${memory_mb}M -jar task.jar -id ${sample_id} -param ${param} -out ${sample_id}.out } output { File results = "${sample_id}.out" } runtime { docker: "broadinstitute/baseimg" } parameter_meta { memory_mb: "Amount of memory to allocate to the JVM" param: "Some arbitrary parameter" sample_id: "The ID of the sample in format foo_bar_baz" } meta { author: "Joe Somebody" email: "[email protected]" } }
Example 4: BWA mem
task bwa_mem_tool { Int threads Int min_seed_length Int min_std_max_min File reference File reads command { bwa mem -t ${threads} \ -k ${min_seed_length} \ -I ${sep=',' min_std_max_min+} \ ${reference} \ ${sep=' ' reads+} > output.sam } output { File sam = "output.sam" } runtime { docker: "broadinstitute/baseimg" } }
Notable pieces in this example is ${sep=',' min_std_max_min+}
which specifies that min_std_max_min can be one or more integers (the +
after the variable name indicates that it can be one or more). If an Array[Int]
is passed into this parameter, then it's flattened by combining the elements with the separator character (sep=','
).
This task also defines that it exports one file, called 'sam', which is the stdout of the execution of bwa mem.
The 'docker' portion of this task definition specifies which that this task must only be run on the Docker image specified.
Example 5: Word Count
task wc2_tool { File file1 command { wc ${file1} } output { Int count = read_int(stdout()) } } workflow count_lines4_wf { Array[File] files scatter(f in files) { call wc2_tool { input: file1=f } } output { wc2_tool.count } }
In this example, it's all pretty boilerplate, declarative code, except for some language-y like features, like firstline(stdout)
and append(list_of_count, wc2-tool.count)
. These both can be implemented fairly easily if we allow for custom function definitions. Parsing them is no problem. Implementation would be fairly simple and new functions would not be hard to add. Alternatively, this could be something like JavaScript or Python snippets that we run.
Example 6: tmap
This task should produce a command line like this:
tmap mapall \ stage1 map1 --min-seq-length 20 \ map2 --min-seq-length 20 \ stage2 map1 --max-seq-length 20 --min-seq-length 10 --seed-length 16 \ map2 --max-seed-hits -1 --max-seq-length 20 --min-seq-length 10
Task definition would look like this:
task tmap_tool { Array[String] stages File reads command { tmap mapall ${sep=' ' stages} < ${reads} > output.sam } output { File sam = "output.sam" } }
For this particular case where the command line is itself a mini DSL, The best option at that point is to allow the user to type in the rest of the command line, which is what ${sep=' ' stages+}
is for. This allows the user to specify an array of strings as the value for stages
and then it concatenates them together with a space character
Variable | Value |
---|---|
reads | /path/to/fastq |
stages | ["stage1 map1 --min-seq-length 20 map2 --min-seq-length 20", "stage2 map1 --max-seq-length 20 --min-seq-length 10 --seed-length 16 map2 --max-seed-hits -1 --max-seq-length 20 --min-seq-length 10"] |
Workflow Definition
$workflow = 'workflow' $ws* '{' $ws* $workflow_element* $ws* '}' $workflow_element = $call | $loop | $conditional | $declaration | $scatter
A workflow is defined as the keyword workflow
and the body being in curly braces.
An example of a workflow that runs one task (not defined here) would be:
workflow wf { Array[File] files Int threshold Map[String, String] my_map call analysis_job { input: search_paths=files, threshold=threshold, gender_lookup=my_map } }
Call Statement
$call = 'call' $ws* $namespaced_identifier $ws+ ('as' $identifier)? $ws* $call_body? $call_body = '{' $ws* $inputs? $ws* '}' $inputs = 'input' $ws* ':' $ws* $variable_mappings $variable_mappings = $variable_mapping_kv (',' $variable_mapping_kv)* $variable_mapping_kv = $identifier $ws* '=' $ws* $expression
A workflow may call other tasks/workflows via the call
keyword. The $namespaced_identifier
is the reference to which task to run. Most commonly, it's simply the name of a task (see examples below), but it can also use .
as a namespace resolver.
See the section on Fully Qualified Names & Namespaced Identifiers for details about how the $namespaced_identifier
ought to be interpreted
All call
statements must be uniquely identifiable. By default, the call's unique identifier is the task name (e.g. call foo
would be referenced by name foo
). However, if one were to call foo
twice in a workflow, each subsequent call
statement will need to alias itself to a unique name using the as
clause: call foo as bar
.
A call
statement may reference a workflow too (e.g. call other_workflow
). In this case, the $inputs
section specifies a subset of the workflow's inputs and must specify fully qualified names.
import "lib.wdl" as lib workflow wf { call my_task call my_task as my_task_alias call my_task as my_task_alias2 { input: threshold=2 } call lib.other_task }
The $call_body
is optional and is meant to specify how to satisfy a subset of the the task or workflow's input parameters as well as a way to map tasks outputs to variables defined in the visible scopes.
A $variable_mapping
in the $inputs
section maps parameters in the task to expressions. These expressions usually reference outputs of other tasks, but they can be arbitrary expressions.
As an example, here is a workflow in which the second task requires an output from the first task:
task task1 { command { python do_stuff.py } output { File results = stdout() } } task task2 { File foobar command { python do_stuff2.py ${foobar} } output { File results = stdout() } } workflow wf { call task1 call task2 { input: foobar=task1.results } }
Scatter
$scatter = 'scatter' $ws* '(' $ws* $scatter_iteration_statment $ws* ')' $ws* $scatter_body $scatter_iteration_statment = $identifier $ws* 'in' $ws* $expression $scatter_body = '{' $ws* $workflow_element* $ws* '}'
A "scatter" clause defines that everything in the body ($scatter_body
) can be run in parallel. The clause in parentheses ($scatter_iteration_statement
) declares which collection to scatter over and what to call each element.
The $scatter_iteration_statement
has two parts: the "item" and the "collection". For example, scatter(x in y)
would define x
as the item, and y
as the collection. The item is always an identifier, while the collection is an expression that MUST evaluate to an Array
type. The item will represent each item in that expression. For example, if y
evaluated to an Array[String]
then x
would be a String
.
The $scatter_body
defines a set of scopes that will execute in the context of this scatter block.
For example, if $expression
is an array of integers of size 3, then the body of the scatter clause can be executed 3-times in parallel. $identifier
would refer to each integer in the array.
scatter(i in integers) { call task1{input: num=i} call task2{input: num=task1.output} }
In this example, task2
depends on task1
. Variable i
has an implicit index
attribute to make sure we can access the right output from task1
. Since both task1 and task2 run N times where N is the length of the array integers
, any scalar outputs of these tasks is now an array.
Loops
TODO: This section is not complete
$loop = 'while' '(' $expression ')' '{' $workflow_element* '}'
Loops are distinct from scatter clauses because the body of a while loop needs to be executed to completion before another iteration is considered for iteration. The $expression
condition is evaluated only when the iteration count is zero or if all $workflow_element
s in the body have completed successfully for the current iteration.
Conditionals
$conditional = 'if' '(' $expression ')' '{' $workflow_element* '}'
Conditionals only execute the body if the expression evaluates to true
Outputs
$workflow_output = 'output' '{' ($workflow_output_fqn ($workflow_output_fqn)* '}' $workflow_output_fqn = $fully_qualified_name '.*'?
Each workflow
definition can specify an optional output
section. This section lists outputs from individual call
s that you also want to expose as outputs to the workflow
itself. Replacing call output names with a *
acts as a match-all wildcard.
If the output {...}
section is omitted, then the workflow includes all outputs from all calls in its final output.
The output names in this section must be qualified with the call which created them, as in the example below.
task task1 { command { ./script } output { File results = stdout() } } task task2 { command { ./script2 } output { File results = stdout() String value = read_string("some_file") } } workflow wf { call task1 call task2 as altname output { task1.* altname.value } }
In this example, the fully-qualified names that would be exposed as workflow outputs would be wf.task1.results
, wf.altname.value
.