How to import workflows that also have imports themselves?

Directory structure as follows:

├── test
│   ├── conda_envs
│   └── data
├── wdl-qc
│   ├── cromwell-workflow-logs
│   ├── test
│   │   ├── conda_envs
│   │   └── data
│   └── wdl-tasks
├── wdl-tasks
└── wdl-utils

The wdl-tasks is a git repository with tasks for several tools in it.
The root directory contains a wdl workflow that assembles a virus (virusAssembly.wdl).
The wdl-qc directory contains QC.wdl
Import statements in virusAssembly.wdl:

import "wdl-qc/QC.wdl" as qc
import "wdl-tasks/spades.wdl" as spades
import "wdl-tasks/seqtk.wdl" as seqtk
import "wdl-tasks/bwa.wdl" as bwa

Import statements in wdl-qc/QC.wdl:

import "wdl-tasks/fastqc.wdl" as fastqc
import "wdl-tasks/cutadapt.wdl" as cutadapt

When I run cromwell all the imports in virusAssembly.wdl are correct.
However, instead of wdl-qc/wdl-tasks/fastqc.wdl wdl-tasks/fastqc.wdl is opened.
Is there a way to correct this behaviour?
Since all WDL files are in a git repo, using absolute paths is not possible.

Tagged:

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev ✭✭
    edited January 31

    I think this is happening because Cromwell's import resolvers are relative to either:

    • Where you're running from (if using run mode)
    • The base of the zip file containing reference WDLs (in server mode)

    So I believe you'll want to make QC.wdl have import "wdl-qc/wdl-tasks/fastqc.wdl" as fastqc

  • Yes that would be a possible workaround.
    But actually I want to import self-contained workflows into other workflows. To fully use the modularity of WDL to its advantage.
    main.wdl imports sub-workflow.wdl imports another.wdl imports some_tasks.wdl
    Since all of these will be in git repos using absolute paths is not possible, and not desirable since it destroys the modularity of WDL.
    Is there functionality present in cromwell to resolve imports relative to the file the import statement was made in? Or should I file this as an issue?

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev ✭✭

    I suggest you think about it more like "fully qualified" or "paths relative to the base directory" than "absolute paths". As long as you structure your workflows in directories starting at some logical base you'll be able to import any other workflow groups you like, by basing their root in the same directory as your root and using their fully qualified import path.

    Here's an example, say I have a WDL and want to use some tool library I've downloaded. My execution base is called base:

    base
    ├── my_wdl
    │   ├── master_workflow.wdl
    │   └── my_utils
    │       └── sub_workflow.wdl
    ├── tool_library
    │   ├── exome_tool.wdl
    │   ├── genome_tool.wdl
    │   ├── base_pair_tool.wdlx
    │   └── [... and more!]
    

    So my fully qualified paths to the downloaded exome_tool is tool_library/exome_tool.wdl and so in master_workflow I can put:

    import "tool_library/exome_tool.wdl"
    

    This has some advantages over relative paths:

    • The import path for the same file is always the same string, no matter where it's imported from. The same import string in master_workflow.wdl works in sub_workflow.wdl
    • When I download a directory of tools, I just put them in the same base as my workflow and I'm ready to go. In the example above, I don't need to put it in a subdirectory of my_utils just so that sub_workflow.wdl can use it.
    • Your above example provides another advantage. With "relative to base" you can import from both wdl-tasks and wdl-qc/wdl-tasks. With a "relative to file" scheme you wouldn't be able to import into wdl-qc from wdl-tasks for example.
  • Thank you for your answer. This is a nice workaround. However I still feel that import paths should be evaluated from the file itself. Not from a base directory that is elsewhere, because this makes wdl files context dependent and therefore not portable. I will create an issue on the cromwell github.

  • @ChrisL Thank you for your time addressing the relative imports issue in depth. I feel I have not given you a full reply to all the advantages you have shown here.

    The import path for the same file is always the same string, no matter where it's imported from. The same import string in master_workflow.wdl works in sub_workflow.wdl

    Which is undesirable. A sub workflow is a building block that should be used in other workflows, not depend on the master workflowit is in. A sub workflow should be independent, and have independent imports. The sub workflow should be able to reside independently in its own git repo, with its own CI tests. This is not compatible with the way imports are handled currently.

    When I download a directory of tools, I just put them in the same base as my workflow and I'm ready to go. In the example above, I don't need to put it in a subdirectory of my_utils just so that sub_workflow.wdl can use it.

    But when my_utils is a git repo, it is versioned. Which means you can have the sub workflow depend on another version of the my_utils package. Otherwise everytime you make a tiny update to the my_utils package you have to check all the pipelines that use it. But if my_utils in a versioned git repo, you do not have to do this. This was exactly why I ran in to this problem. I had to address the issues of the sub-workflow in the main workflow, but I wanted to address them in the sub-workflow so other workflows importing QC could benefit from the bugfixes.

    Your above example provides another advantage. With "relative to base" you can import from both wdl-tasks and wdl-qc/wdl-tasks. With a "relative to file" scheme you wouldn't be able to import into wdl-qc from wdl-tasks for example.

    Again, since WDL-QC should function independently, I should not have to do this. If my QC pipeline has dependencies in the pipeline it is called in it means QC will function diferently for every pipeline it is called in. Which makes debugging QC impossible. I want my QC pipeline to function the same way in every pipeline it is called in. It should be an independent module.

Sign In or Register to comment.