To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Two level scatter gather

Suppose I want to scatter on the output result of some previous scatter gather. Pseudo code below.

workflow test {
Array[File] -things= read_tsv (inputfile)
scatter (thing in things){
call task1(input stuff=thing)
scatter (object in task1.array){
call task2(input stuff=object)
}

}
}
Example would be that I want to call BWA on 10 samples, but I want to breakup the fastq files into 10 pieces to call BWA in parallel to make it faster.
Right now WDL complains that it "could not get a value for task1". What could I do to make it work

Best Answer

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev

    @KateN @awacs I just wanted to post an update to this, now that things have settled out a bit. Scatters-within-scatters is unlikely to happen soon, because of the way Cromwell indexes the shards of a scatter (e.g. we can reference scatter shard 10, but we can't support shard 10:10 for the inner scatter, for example).

    The good news is that, depending on your scenario, there will now be a way to express what you want.

    • If you wanted to use scatters as a way of iterating over two arrays, (e.g. for x in xs: ... for y in ys: ..), you can use the cross(xs, ys) function to get an Array[Pair[X,Y]].

    • If that fails, you can call subworkflows from inside a scatter, and subworkflows can contain their own scatters. So you can do something like:

    • main.wdl

    import sub.wdl as sub
    workflow test {
      Array[File] things= read_tsv (inputfile)
      scatter (thing in things){
        call sub.sub_wf { input: thing = thing }
      }
    }
    
    • sub.wdl
    task task1 { ... }
    task task2 { ... }
    workflow sub_wf {
      File thing
      call task1(input stuff=thing)
      scatter (object in task1.array){
        call task2(input stuff=object)
      }
    }
    

    Issue · Github
    by Geraldine_VdAuwera

    Issue Number
    1995
    State
    open
    Last Updated
Sign In or Register to comment.