(howto) Generate a JSON file describing inputs

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited July 2017 in Tutorials

The simplest way to specify values for the input variables (such as file names and parameters) to the commands in your WDL scripts is to hard-code them, i.e. write them in the script itself. However, doing so forces you to make a new copy and edit the inputs every time you want to run your script on a new batch of data -- which undermines the advantages of setting up a pipeline script in the first place.

A much better way to proceed is to specify all the values for the input variables that you want to be able to customize from run to run in a JSON file (a structured text format a bit like XML but better -- certainly more readable). Then all you need to do is create a new file of inputs for each new batch of data that you want to run through your pipeline. The execution engine will use that JSON file to fill in the values of inputs to commands in your script where appropriate.

Still, on the face of it you might think that putting together the JSON file of inputs (specifically, structuring it correctly and not forgetting any command's inputs) would be a tedious and/or daunting task, especially in a command-line-only world with no point-and-click GUI.

But fear not! Help is at hand. WDL comes with a utility function (in the wdltool package) that will parse your WDL script and automatically generate a template JSON file containing the appropriate input file and parameter definitions. All you need to do then is to populate a copy of the file with the actual values that you want to use in a given run of the pipeline. When you are ready to run your script on your chosen execution engine, you'll simply provide the inputs file along with the script.


Generating the template JSON

To generate the template of inputs for your WDL script, simply call the wdltool inputs function on your script:

java -jar wdltool.jar inputs myWorkflow.wdl > myWorkflow_inputs.json

This will create a file called myWorkflow_inputs.json that lists all the inputs to all the tasks in your script following the pattern below:

{
    "<workflow name>.<task name>.<variable name>": "<variable type>"
}

This saves you from having to compile a list of all the tasks and their variables manually! Pretty nifty, right?

If you omit the > myWorkflow_inputs.json part of the command, the template content will be output to the terminal instead of being written to file.


Customizing the inputs file for a particular run

Every time you want to run the script on some new data or with some different parameters, you simply open this file (or better, a copy) in a text editor and replace the part on the right of the colon with the value that you want.

In case you're wondering, the "<variable type>" bit in the original template is just there to remind you what type of variable the task will expect to see. In the same spirit, we do recommend giving your tasks and variables names that will be meaningful when you're going through your inputs file filling in filenames and parameter values. Otherwise you'll find yourself having to refer back to the pipeline script itself often, as it's not possible to add comments in a JSON file.

Example

Let's say the myWorkflow.wdl script describes a workflow called myWorkflowName. This workflow includes a task called stepA that takes two inputs, a File called input_file and a String called sample_name. The input template generated by the command above would look like this:

{
    "myWorkflowName.stepA.input_file": "File"
    "myWorkflowName.stepA.sample_name": "String"
}

So to run this script on a file called input.bam where the sample name of interest is NA12878, you would change it to:

{
    "myWorkflowName.stepA.input_file": "~/path/to/input.bam"
    "myWorkflowName.stepA.sample_name": "NA12878"
}
Post edited by Geraldine_VdAuwera on
Tagged:
Sign In or Register to comment.