We've moved!
For WDL questions, see the WDL specification and WDL docs.
For Cromwell questions, see the Cromwell docs and please post any issues on Github.

How to specify an output directory with the Options JSON file?

shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

Hi @KateN,

How can I tell WDL to copy over some of the outputs I'm interested in into a separate directory? I know this requires (i) an output section at the end of the WORKFLOW and (ii) the OPTIONS JSON file. However, it's not clear to me what needs to go into the OPTIONS JSON file and how to autogenerate it.


Best Answer

  • shleeshlee Cambridge ✭✭✭✭✭
    Accepted Answer

    I just learned that Cromwell v20 allows for the above but that call-caching is only available in v19 thus far. Since call-caching is a more important feature for me, I'll stick with v19 until v20 catches up and then try the above again.


  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    There are two ways to specify an output directory, depending on what you are after.

    OPTION 1. Send all job outputs to a destination directory
    Essentially, this option will send all job outputs from your workflow to the new directory, which includes even the outputs that were not labelled so in the outputs section of the task. This can be accomplished by setting the appropriate config option equal to the destination directory when running your workflow. In version 0.19 (the current version at time of writing), it is:

    java -Dbackend.shared-filesystem.root=/set/new/destination/directory -jar cromwell.jar run ...

    In version 0.21 (which is forthcoming)

    java -Dbackend.providers.Local.config.root=/set/new/destination/directory -jar cromwell.jar run ...

    OPTION 2. Copy workflow outputs to a destination directory
    This method copies over any outputs that were declared with output variables in the task output section over to the new directory specified. It can be done by setting the final_workflow_outputs_dir option in the workflow options. For more information on workflow options, see here

    The OPTIONS JSON you mention is a file much like the input JSON. While it cannot be auto-generated using wdltools (as you do for the inputs), you can set the key/value pairs for the configuration settings specific to a particular workflow instance. This can be especially useful if you need to adjust multiple -D settings as mentioned in OPTION 1. When running from the command line, the options file is passed in last, after the inputs file.

    java -jar cromwell.jar sample.wdl sample_inputs.json sample_options.json
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Thanks @KateN for this.

    I tried the first solution and the results are not what I expected. I'm using Cromwell v0.19 on my laptop. I thought it would be easier to explain what's happening if I attached a screenshot. I've highlighted certain portions of the Terminal session and the Finder window. Note that for WDL purposes I use the absolute paths but for this post I've simplified these paths.

    • In Terminal (in yellow), I'm working out of Desktop/tutorial_8017 and you see that before the $.
    • On the right of the $, is the command with solution 1 I used. It was the last command I ran and it ran successfully. I am asking Cromwell to put the outputs on my desktop. This does not happen.
    • Rather, the outputs are under the first highlighted folder in the Finder window under Desktop/tutorial_8017/. Typically, without the -Dbackend.shared-filesystem.root, my outputs go to Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/xxx like the second highlighted folder, where Tutorial_8017 is the workflow name and xxx is the run tag.

    What I would like is for certain outputs I choose to go to the folder I specify, e.g. Desktop or Desktop/tutorial_8017, without layers of directory structures. Is this possible or can you help me get near to this simplification?



  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Using the first option told Cromwell to send all outputs of the job to the directory you told it. This means that all outputs, not just the task's outputs were sent there. (Here, "job" refers to a specific run of the workflow--it is recorded as that hash number you see in the first file folder highlighted)

    In your case, it looks like you want just the task outputs sent to your desktop, which I believe you can accomplish with an OPTIONS JSON and Option 2 above. In your case, your options file should look like this:

        "final_workflow_outputs_dir" : "/Users/shlee/Desktop"

    Then, when running the workflow again, specify that options json file last, as I mentioned above.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    I don't see any outputs on my desktop only in the normal place under /Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/xxx. I think I'm missing some component.

    I specify the the output directory in the OPTIONS JSON as you show. The command I use to run the script is:

     java -jar $cromwell run tutorial_8017_outputsspecified.wdl tutorial_8017.json tutorial_8017.options.json

    And I have the following section at the end of my WORKFLOW:

    # Outputs that will be retained when execution is complete
    output {

    My run completes successfully with an interesting message at the end that appears to regurgitate the files I'm interested in copying over.

    [2016-08-09 16:18:43,356] [info] WorkflowActor [70714fe2]: transitioning from Running to Succeeded.
      "Tutorial_8017.SortFixTagsAndIndex.snaut_bai": ["/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-SortFixTagsAndIndex/shard-0/altalt_snaut.bai", "/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-SortFixTagsAndIndex/shard-1/paalt_snaut.bai", "/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-SortFixTagsAndIndex/shard-2/papa_snaut.bai"],
      "Tutorial_8017.CallCohortVariants.cohort_vcf": "/Users/shlee/Desktop/tutorial_8017/multisample.vcf",
      "Tutorial_8017.SortFixTagsAndIndex.snaut_bam": ["/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-SortFixTagsAndIndex/shard-0/altalt_snaut.bam", "/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-SortFixTagsAndIndex/shard-1/paalt_snaut.bam", "/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-SortFixTagsAndIndex/shard-2/papa_snaut.bam"],
      "Tutorial_8017.AlignFastqWithBwaMem.aligned_sam": ["/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-AlignFastqWithBwaMem/shard-0/altalt.sam", "/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-AlignFastqWithBwaMem/shard-1/paalt.sam", "/Users/shlee/Desktop/tutorial_8017/cromwell-executions/Tutorial_8017/70714fe2-9cdd-4071-ad88-d95b748810db/call-AlignFastqWithBwaMem/shard-2/papa.sam"],
      "Tutorial_8017.CallCohortVariants.cohort_vcf_index": "/Users/shlee/Desktop/tutorial_8017/multisample.vcf.idx"
    [2016-08-09 16:18:43,379] [info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.

    I suppose I can copy-paste these file paths for the files I'm interested in. I feel like we are nearly there to a solution. What am I missing?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    Accepted Answer

    I just learned that Cromwell v20 allows for the above but that call-caching is only available in v19 thus far. Since call-caching is a more important feature for me, I'll stick with v19 until v20 catches up and then try the above again.

Sign In or Register to comment.