WDL + Cromwell + AWS Batch

golharamgolharam Member ✭✭✭

Hi all - I'm trying to figure out the best way to write pipelines in WDL with AWS Batch. As I understand it, each Task in WDL is a separate AWS Batch job. As such, each Batch job can run on any instance, independent of the jobs before it. This means each Task/Batch Job must be able to download necessary input data from S3 and upload output data to S3 so the next Task can run independently.

How does Cromwell + WDL handle this? Will Cromwell move data to S3 for me? I initially put s3cp Tasks in my WDL pipeline, but I hate that I have to do this. In fact, it isn't guaranteed to work since the s3cp Task is separate from the actual work Task.

I've ended up putting logic in my Docker container to download/upload data to S3 and just using WDL to write the pipelines.

Does this sound like a good approach? To me, it doesn't because my Docker container is now tied to using an S3 url. What is the best approach for this?

Sign In or Register to comment.