Multiple workflow languages coming to Cromwell, starting with CWL
Cromwell and WDL started out as a codependent couple, matched by design and tied together by habit. But I like to emphasize that Cromwell and WDL share more than just development history. They also share a fundamental philosophy that prioritizes user-friendliness, with the goal of making simple use cases easy to realize while ensuring that users with more complex needs have a reasonable path forward. I believe that this philosophy plays a key role in making Cromwell and WDL best in class tools for bioinformatics computing. But sometimes user-friendliness means a different thing for one compared to the other. And so it has become clear that in order to maximize their usefulness and satisfy the needs of a wide community, Cromwell and WDL need to be decoupled so that they can both evolve more freely.
A few months ago we opened WDL as a community driven standard, in the care of an independent organization called OpenWDL. Since then there have been several discussions and language additions driven by contributors outside of the Broad, which is incredibly encouraging and gives us confidence that OpenWDL is truly going to empower the user community by driving the evolution of the language as well as enabling independent implementations to flourish.
In its capacity as an implementer of WDL, Cromwell will naturally continue to support WDL in its evolution as a language. But why stop there? Many people in the bioinformatics community have existing workflows written in languages other than WDL. With its multiple run modes (single workflow vs. server), pluggable backends, call caching, and a range of configuration options, I believe that Cromwell is already the most flexible bioinformatics workflow system in an admittedly crowded field. So it was a logical next step for us to decide to also be flexible about the workflow languages it can process.
In the year and a half I spent as a co-lead of the GA4GH Containers & Workflows working group, I saw first hand how many groups were using the Common Workflow Language (CWL). It was a no-brainer for us to make CWL the first language we supported beyond WDL. Then, during the GA4GH Plenary in Orlando we had a kickoff meeting for the newly formed Cloud Workstream, where it was noted that all of the relevant GA4GH driver projects had settled on using at least one of WDL and CWL. This means the driving focus for all the interoperability efforts by this workstream (tool registry, task execution, data object and workflow execution schemas) will revolve around WDL and CWL. That confirmed we had made the right decision.
The next step was to actually make it happen, and the big question was how. We could have just turned the inside of Cromwell into a giant ball of
if (wdl) doX else if (cwl) doY statements, but this offended our engineering sensibilities. What happens if we go on to support five languages? What if we decide to allow users to provide their own language support, like we do with backends? Wailing and gnashing of teeth, most likely. So instead, we went with a bolder plan and… well, we came up with yet another workflow scheme, but as you'll see it was for the best.
In previous versions, Cromwell converted WDL into a series of Scala objects and operated on those. We were pretty sure that we couldn’t map both CWL and WDL onto these objects, as it’s been known for a while that it’s difficult to convert directly between WDL and CWL, in either direction (see previous attempts made here and here). To get around that problem, we created the Workflow Object Model, or WOM. The details here are still in flux so I’ll save the deep dive for later, but the idea is that this is an in-memory representation for workflows that Cromwell can directly execute. We can then write code to give Cromwell the ability to translate WDL, CWL, or anything else we want into WOM, and then Cromwell runs that. That means you could run any supported language on the same server instance of Cromwell, completely seamlessly.
And it works! As of version 30, Cromwell runs WOM under the hood when you run a WDL workflow. It was a major undertaking and not without a couple of bumps in the road), but it was both a successful proof of concept and a giant step in the right direction. We're now actively working on finishing the CWL to WOM mapping, which we expect to wrap up and deliver within the next few weeks. We’re already able to run a number of CWL workflows, and once this mapping is complete we’ll release a version with general support. There's theoretically no limit to what else Cromwell could support, so we welcome suggestions... and implementation pull requests!