Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Feedback on initial version of bcbio WDL converted from CWL

Hi all;
I've been working to run bcbio (https://github.com/chapmanb/bcbio-nextgen) with Cromwell using conversion of bcbio generated CWL (http://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html) to WDL. I have an in-progress CWL to WDL converter (https://github.com/chapmanb/bcbio-nextgen/blob/master/scripts/utils/cwltool2wdl.py) that generates reasonable looking WDL for most of the functionality of CWL we're using:

https://gist.github.com/chapmanb/af6aa1308d3438172c5fc842d53b0bb9#file-main-run_info-cwl-wdl

The idea is to practically get bcbio running, and also define an interoperable subset of WDL and CWL. This implementation currently supports workflows, tasks and nested workflows. It converts CWL records to WDL objects and also has scatter functionality.

I have most of the parts in place but have a couple of questions about the best way to integrate some last steps prior to starting to actually try runs with Cromwell. I'm completely open to changing bcbio outputs to make it more compatible with WDL but not sure about the best way to do it. The two areas I'm not sure about are:

Thanks for any suggestions and thoughts. I'm excited to open a conversation on this and provide some interoperability between WDL and CWL so we can make use of bcbio across multiple platforms.

Answers

  • jgentryjgentry Member, Broadie, Dev ✭✭✭

    Hi @chapmanb - this is fantastic!

    Some answers, as best I can:

    • multi-variable scatter: No, not at the moment. There's also no nested scatter support at the moment. The latter is almost certainly going to change over the next month-ish, the former perhaps with it. Note https://github.com/broadinstitute/cromwell/issues/1564 which I think give would give you what you're after though. This has started coming up internally and will need to get solved in a less hacky way. I don't think scattering over an Object would get you what you want, I think you'd need an Array[Object], scatter at the moment requires the collection to be an Array type and I don't think an Object will coerce to an Array.

    • (err, after writing this I realized it's not what you're asking, I'll leave it just in case you care) output json: If you're in server mode, yes. There's an outputs endpoint that you can query with your workflow ID. If you're using run mode the answer is yes-ish. You can specify a file to output the full workflow metadata, which contains outputs but a lot of other stuff. See https://github.com/broadinstitute/cromwell#run for an example. I've created a ticket to provide a scheme for outputting just the output JSON: https://github.com/broadinstitute/cromwell/issues/1578

    • output json, take 2: Now that I re-read I'm not completely sure what you're after here. You have a CWL output JSON and what would you ideally want after importing it?

    Does this help at all?

    J

  • chapmanbchapmanb Boston, MAMember ✭✭

    Thanks so much, this is a huge help:

    • For multi-variable scatter, #1564 is exactly what I'd be looking for. Sorry, I did mean Array[Object] which sounds like it's the best workaround now. Thanks for the clarification.

    • For json, I'd like the script that runs (a bcbio command here) to be able to generate a file with the value of the output (say the path to a File). Ideally it would be a single file with all the outputs for a command, which is nice functionality that CWL provides. So in the example I post above the output file specifies the value for two outputs of the task: region a string, and vrn_file_region, the path to a file (and it's secondary index files) generated by the script. This way the workflow doesn't need to pre-specify the paths to output files and the software itself can generate them. Does that help clarify?

    Thanks again.

  • jgentryjgentry Member, Broadie, Dev ✭✭✭

    @chapmanb Sorry for the long delay, as I mentioned (elsewhere?) I was away for a few weeks. #1564 is underway now so should be available soon.

    I see what you mean now. I don't see how that's doable at the moment, asking around no one else had heard a similar request so the likelihood of it existing is small. As you guessed you might be able to get what you want (theoretically*) via Object and read_json if Object would suit your needs. I say theoretically as I don't believe it's wired up at the moment, again - dustier corner.

    If this is behavior that you want I'd suggest opening an issue on our github describing what you're after and then our product owner @kcibul can take it from there.

  • chapmanbchapmanb Boston, MAMember ✭✭

    Thanks so much, hope that ASHG was productive and fun. Thanks also for working on #1564, looking forward to trying it out. I opened an issue for the CWL json style output specification:

    https://github.com/broadinstitute/cromwell/issues/1628

    Thanks for helping iron out the last couple of issues, looking forward to doing practical testing with bcbio on this.

  • jgentryjgentry Member, Broadie, Dev ✭✭✭

    Perfect, thanks @chapmanb

    Unrelated, I was having a conversation with someone from another group regarding CWL->WDL conversion and pointed him to your work, so hopefully that leads to more eyeballs & energy for you.

Sign In or Register to comment.