Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Computing Inputs

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

Both tasks and workflows have a typed inputs that must be satisfied in order to run. The following sections describe how to compute inputs for task and workflow declarations

Task Inputs

Tasks define all their outputs as declarations at the top of the task definition.

task test {
  String s
  Int i
  Float f

  command {
    ./script.sh -i ${i} -f ${f}
  }
}

In this example, s, i, and f are inputs to this task. Even though the command line does not reference ${s}. Implementations of WDL engines may display a warning or report an error in this case, since s isn't used.

Workflow Inputs

Workflows have declarations, like tasks, but a workflow must also account for all calls to sub-tasks when determining inputs.

Workflows also return their inputs as fully qualified names. Tasks only return the names of the variables as inputs (as they're guaranteed to be unique within a task). However, since workflows can call the same task twice, names might collide. The general algorithm for computing inputs going something like this:

  • Take all inputs to all call statements in the workflow
  • Subtract out all inputs that are satisfied through the input: section
  • Add in all declarations which don't have a static value defined

Consider the following workflow:

task t1 {
  String s
  Int x

  command {
    ./script --action=${s} -x${x}
  }
  output {
    Int count = read_int(stdout())
  }
}

task t2 {
  String s
  Int t
  Int x

  command {
    ./script2 --action=${s} -x${x} --other=${t}
  }
  output {
    Int count = read_int(stdout())
  }
}

task t3 {
  Int y
  File ref_file # Do nothing with this

  command {
    python -c "print(${y} + 1)"
  }
  output {
    Int incr = read_int(stdout())
  }
}

workflow wf {
  Int int_val
  Int int_val2 = 10
  Array[Int] my_ints
  File ref_file

  call t1 {
    input: x=int_val
  }
  call t2 {
    input: x=int_val, t=t1.count
  }
  scatter(i in my_ints) {
    call t3 {
      input: y=i, ref=ref_file
    }
  }
}

The inputs to wf would be:

  • wf.t1.s as a String
  • wf.t2.s as a String
  • wf.int_val as an Int
  • wf.my_ints as an Array[Int]
  • wf.ref_file as a File

Specifying Workflow Inputs in JSON

Once workflow inputs are computed (see previous section), the value for each of the fully-qualified names needs to be specified per invocation of the workflow. Workflow inputs are specified in JSON or YAML format. In JSON, the inputs to the workflow in the previous section can be:

{
  "wf.t1.s": "some_string",
  "wf.t2.s": "some_string",
  "wf.int_val": 3,
  "wf.my_ints": [5,6,7,8],
  "wf.ref_file": "/path/to/file.txt"
}

It's important to note that the type in JSON must be coercable to the WDL type. For example wf.int_val expects an integer, but if we specified it in JSON as "wf.int_val": "3", this coercion from string to integer is not valid and would result in a type error. See the section on Type Coercion for more details.

Sign In or Register to comment.