Possible to distribute work across multiple types of environment

If I understand correctly, it is possible to configure Cromwell to execute one task on one backend and another task on a different backend. But is it possible to configure Cromwell to execute the same tasks on two different backends, depending on resource availability?

Here's my use case: I want to align/sort/index/call variants on NGS data. I have two different HPC environments and a local system. I want to use these resources for the computing when they are available. When they are not, I want to execute tasks on Google Cloud. (Tentatively, I will store data in Google Storage.)

Is it possible to configure Cromwell to execute these tasks on multiple backends simultaneously, depending on which systems are available at the time?

Thanks!

Tagged:

Answers

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Hey @spiccolo,

    The functionality you describe doesn't exist today, however, that's definitely the type of use case we hope to fulfill with Cromwell in the future!

    There are a few pieces around this feature that need some more evaluation, like the cost of shuffling data between different cloud object stores and how to track the results of a multi-backend workflow in a common location. I would love to hear your thoughts on these topics:

    1. If a workflow was ran partially using the GCP backend and partially using an AWS Backend -- which data store would you expect the outputs of your tasks to be in?
    2. Would it be reasonable to assume that a user would know best as to what backend would yield a less expensive run -- so there could be something like a list of backends for a task, so that Cromwell can choose the secondary backend if the first one is tied down due to quota and thus its worth paying extra to get a task to finish?

    Thanks!

  • Thanks for your response.

    1. That's a tricky one. I am not sure. Maybe the user would have to store all input/output data in a single location. So that would be static. What would change, though, is where the computations are done.

    2. That would probably be the only way to do it...let the user prioritize.

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Hey @spiccolo,

    So to have data live in a common store, that's totally feasible, though just to be explicit, there are egress charges associated with doing that.

    I guess your original request was around Cromwell checking the marketplace for multiple clouds to see which of them offers the cheapest resource of some specified shape (cpu, disk, etc) so that a user can get the best possible compute option at any given time.

Sign In or Register to comment.