To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

How to return an Array given as input

Hi,

Maybe there is a much better way, but I manage the order of my calls in my WDL script by always using the output of a task A for the input of a task B. In most cases this is necessary since a workflow needs a specific order. But in other cases I just want to limit the number of parallel called tasks. Coverage for example could be run parallel to the BQSR but I'd like it done step by step. And we have enought data and samples that our server is at its limit.

The problem is that I don't really get how to return an input array again as output. When I have done all the "preprocessing" Picard steps and have indexed, sorted bam files I want to do the coverage for all samples in one.
For the steps before I use Scatter Gather and the output of each task is a String. These are automatically put together into an array which can be used in the next Scatter Gather Step.
After the Calling step further down the line I use CombineGVCFs. The input is a complete (unscattered) array and I return one string / one file.
In case of the Coverage calculation I want to use one complete array in an unscattered task and return it unchanged. This way I would keep the order of my tasks. But how do I do this?

If I write: Array[String] samples = ${input_array} I get
Unrecognized token on line 511, column 27:
Array[String] samples = ${bamFiles}

If I put it in "" I get:
ERROR: samples is declared as a String but the expression evaluates to a Array[String]:
Array[String] samples = "${bamFile}"

Or is there a more elegant way to manage the order of my calls? That would be nice. I don't really like the way I do it but since WDL starts each task where all the inputs are available, I don't see another way to specify an order.

Thanks a lot for your help!

Best regards,
Daniel

Best Answer

Answers

  • dbeckerdbecker MunichMember

    Thanks a lot!
    Most Languages either need a symbol like $ for alle datatayes or none at all. So I was confused by that.

    Such a dependsOn would be great. I genereally like the idea that everything just runs as soon as possible but on older servers and with tools like DepthOfCoverage this may lead to problems.

Sign In or Register to comment.