The weight of clouds, and what it takes to support one
One of my favorite features of the Cromwell workflow management system is that it was designed from the start to "support multiple computing platforms in order to maximize portability and reproducibility of analysis workflows". In my role as an advocate for a research community that struggles with standardization and interoperability, just reading those words in a sentence makes me perk up like a kid hearing the ice cream truck song. But then it immediately raises the question -- what does it mean for Cromwell to support a given platform? Let's take a moment today to unpack that, since "support" is kind of a loaded term in the world of workflow management systems.
In principle, Cromwell will happily run on any machine using its preferred version of Java, and… well, that's about it for technical requirements. Under those terms, you could run it on a VM on any cloud platform you like; spin up a bunch of nodes, set up LSF on them, and run, say, Cromwell-on-LSF-on-AWS. But if you want to take advantage of the true power of that platform, you have to manage a whole set of additional layers: access to object storage, containers, authentication and so on. That's a lot of work, and for many of our users it's not a realistic option. It's certainly not my idea of a fun time.
The good news is that when we say Cromwell supports a given platform, we mean it will manage all of that for you. That's where the concept of a backend comes in; it's essentially the plugin adapter that allows Cromwell to talk directly with the various components of the platform you want to run on, and orchestrate all those operations seamlessly to get the job done. You just need to give it the right configuration file -- and naturally we provide templates for all supported platforms. Ultimately, our goal is to provide a seamless computing experience that minimizes setup and maximizes throughput, so you can just get on with the interesting part of your work. This applies to all platforms; whether you choose to use cloud resources, a local HPC cluster, or both -- Cromwell's job is to empower you to be more productive by making the pipelining process easy and scalable.
That being said, someone still has to write the backend for each platform we want to support. In the case of cloud platforms, that requires expert-level understanding of things like how the resource allocation system operates, which can be quite challenging to acquire. Thankfully, we don't have to do it all ourselves -- Cromwell is an open-source project, and benefits from many contributions made by external developers. That includes experts who know specific systems inside out, sometimes because they helped build them! Cromwell's cloud backends are a great example of this, having been produced primarily by engineers from their respective cloud companies. We are deeply indebted to their advice, code and collaboration; we hope others will be inspired to contribute their own backends for other platforms and thus further extend the effortless flexibility of Cromwell to the greater research community.