Resume pipeline

So there's this quote from the WDL release blog "I mean smart, robust pipelines that can understand things like parallelism, dependencies of inputs and outputs between tasks, and resume intelligently if they get interrupted."

Has the "intelligent resumption" feature been implemented in WDL yet?

Tagged:

Best Answer

Answers

  • EADGEADG KielMember ✭✭✭

    Hi @awacs
    take a look at

    Call-caching this might be the feature you are searching for.

    Greatings EADG

  • mxqianmxqian Member

    @Geraldine_VdAuwera I'm learning the WDL from the "helloword". Fortunately, after hundreds of trying, it can work under LSF cluster now. But still, I have some questions.
    1. Is there any way to change the result folder structure and keep it unique with a customer-provided name?
    2. Is it unchangeable that every run creates a new folder? How to make it like the Queue behavior? Just check the DONE status to determine resume/skip for each step?
    3. How to change the values for "cwd", "job_name", "out", "err" (-J ${job_name} -cwd ${cwd} -o ${out} -e ${err} for submit)?

  • kshakirkshakir Broadie, Dev ✭✭

    Hi @mxqian,

    As you're seeing there are significant differences in the design and features of Queue and Cromwell. Some features of cromwell that may be of use for your particular situation:

    • Cromwell may use a persistent MySQL (compatible, including CloudSQL, etc.) database to store job information
    • Cromwell may run in a server mode providing a REST endpoint, including the ability to retrieve the job information with HTTP/JSON

    To answer your specific questions:

    1. Is there any way to change the result folder structure and keep it unique with a customer-provided name?

    The folder structure is generated by cromwell. However the paths to workflow outputs are retrievable from the cromwell REST endpoint, along with the workflow status. Once a workflow has finished running, one may retrieve and then copy/link the outputs to a custom location.

    1. Is it unchangeable that every run creates a new folder? How to make it like the Queue behavior? Just check the DONE status to determine resume/skip for each step?

    Cromwell always generates a new folder for each run. When enabled, the call caching feature can copy or link files from a previously successful run, but for new runs cromwell will not try to overwrite previous results. The REST endpoint will return the final output locations.

    1. How to change the values for "cwd", "job_name", "out", "err" (-J ${job_name} -cwd ${cwd} -o ${out} -e ${err} for submit)?

    The job_name is generated by cromwell, as are each of those absolute file paths, and are not customizable. One can set the top level execution directory backend.providers.<your_backend>.config.root, but not the paths within that folder.

    If you'd like to discuss a specific use case or feature request with our team, please feel free to submit a detailed writeup of your suggestion on our github issues page?

    Thanks!

  • mxqianmxqian Member

    Hi @kshakir, thank you very much for answering. Cromwell may be designed for cloud computing. BTW, you said several "may", is there any document for setting them? May be with a "helloworld" example first. Thanks again.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Hi @mxqian, our documentation for Cromwell is still under active development. @KateVoss may be able to point you to materials that are relevant to your specific questions.
Sign In or Register to comment.