Scatter parallelism on GCP

Hello, newbie question here - how does Cromwell achieve parallelism on the Google Cloud Platform amongst multiple nodes? We have a group of tasks that are part of the workflow that we know could benefit from getting parallelized. We'd like to divide up this work between multiple nodes and were wondering how scatter works behind the scenes.

How do you specify the CPU and/or GPU for these nodes?
How does Cromwell figure out how many nodes should be started? (how does it know when to shut them down?)
Can Cromwell be tied in with Kubernetes?
Does the gather step retrieve all the output files from the scatter'd nodes and copy them to the node running the Cromwell server?
What's the precise notion that Cromwell uses to know that a scatter node has finished its processing?

We're just getting started with Cromwell, so apologies for the beginner questions and thanks in advance!

  • Ed

Best Answer

Answers

Sign In or Register to comment.