Multiple workflow languages coming to Cromwell, starting with CWL

jgentryjgentry Member, Broadie, Dev

Cromwell and WDL started out as a codependent couple, matched by design and tied together by habit. But I like to emphasize that Cromwell and WDL share more than just development history. They also share a fundamental philosophy that prioritizes user-friendliness, with the goal of making simple use cases easy to realize while ensuring that users with more complex needs have a reasonable path forward. I believe that this philosophy plays a key role in making Cromwell and WDL best in class tools for bioinformatics computing. But sometimes user-friendliness means a different thing for one compared to the other. And so it has become clear that in order to maximize their usefulness and satisfy the needs of a wide community, Cromwell and WDL need to be decoupled so that they can both evolve more freely.


A few months ago we opened WDL as a community driven standard, in the care of an independent organization called OpenWDL. Since then there have been several discussions and language additions driven by contributors outside of the Broad, which is incredibly encouraging and gives us confidence that OpenWDL is truly going to empower the user community by driving the evolution of the language as well as enabling independent implementations to flourish.

In its capacity as an implementer of WDL, Cromwell will naturally continue to support WDL in its evolution as a language. But why stop there? Many people in the bioinformatics community have existing workflows written in languages other than WDL. With its multiple run modes (single workflow vs. server), pluggable backends, call caching, and a range of configuration options, I believe that Cromwell is already the most flexible bioinformatics workflow system in an admittedly crowded field. So it was a logical next step for us to decide to also be flexible about the workflow languages it can process.

In the year and a half I spent as a co-lead of the GA4GH Containers & Workflows working group, I saw first hand how many groups were using the Common Workflow Language (CWL). It was a no-brainer for us to make CWL the first language we supported beyond WDL. Then, during the GA4GH Plenary in Orlando we had a kickoff meeting for the newly formed Cloud Workstream, where it was noted that all of the relevant GA4GH driver projects had settled on using at least one of WDL and CWL. This means the driving focus for all the interoperability efforts by this workstream (tool registry, task execution, data object and workflow execution schemas) will revolve around WDL and CWL. That confirmed we had made the right decision.

The next step was to actually make it happen, and the big question was how. We could have just turned the inside of Cromwell into a giant ball of if (wdl) doX else if (cwl) doY statements, but this offended our engineering sensibilities. What happens if we go on to support five languages? What if we decide to allow users to provide their own language support, like we do with backends? Wailing and gnashing of teeth, most likely. So instead, we went with a bolder plan and… well, we came up with yet another workflow scheme, but as you'll see it was for the best.

In previous versions, Cromwell converted WDL into a series of Scala objects and operated on those. We were pretty sure that we couldn’t map both CWL and WDL onto these objects, as it’s been known for a while that it’s difficult to convert directly between WDL and CWL, in either direction (see previous attempts made here and here). To get around that problem, we created the Workflow Object Model, or WOM. The details here are still in flux so I’ll save the deep dive for later, but the idea is that this is an in-memory representation for workflows that Cromwell can directly execute. We can then write code to give Cromwell the ability to translate WDL, CWL, or anything else we want into WOM, and then Cromwell runs that. That means you could run any supported language on the same server instance of Cromwell, completely seamlessly.

And it works! As of version 30, Cromwell runs WOM under the hood when you run a WDL workflow. It was a major undertaking and not without a couple of bumps in the road), but it was both a successful proof of concept and a giant step in the right direction. We're now actively working on finishing the CWL to WOM mapping, which we expect to wrap up and deliver within the next few weeks. We’re already able to run a number of CWL workflows, and once this mapping is complete we’ll release a version with general support. There's theoretically no limit to what else Cromwell could support, so we welcome suggestions... and implementation pull requests!

Comments

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev

    One other point, this also adds the possibility down the road of cross-language workflows.

    Say you have a bunch of useful tasks or workflows that you want to compose together but some are written in CWL and some are in WDL... well nothing in the WOM layer precludes calling into CWL CommandLineTools from a WDL workflow or likewise calling into WDL tasks from a CWL Workflow (we would just need to make import resolution more general-purpose in our X -> WOM adapters).

    In this future, WOM will allow us to enable a huge variety of cross-composed workflows without forcing individual tools to be rewritten in every supported language. I think that's going to be a pretty cool future for tool and workflow authors to live in!

  • dannykwellsdannykwells San FranciscoMember

    Hi! This is super exciting. Is there any update on when CWL support will commence in cromwell? We have some cwl from collaborators that we would love to try out on our infrastructure.

    Thank you for this great work - impressive!!

  • jgentryjgentry Member, Broadie, Dev

    Hi @dannykwells - If you're willing to build off of the develop branch, there's a nontrivial chance that your CWL would Just Work right now. Of course there's a nontrivial chance it'd die a fiery death. So I suppose you can see where we are right now :)

    That said, by the end of the quarter we intend to have a release that'll allow most CWLs used by most people to work, with an ultimate goal to have full (see below) support by mid-May

    "full" .... where full is defined as "hopefully everything, but we reserve the right to punt on things we've never actually seen evidence of having been used in the wild"

Sign In or Register to comment.