Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Is there a way to flatten arrays?

Let's say that I have an Array[Array[File]]. Is there a way to flatten this so that I have just an Array[File]?


Best Answer


  • mmahmmah ✭✭ Member, Broadie ✭✭
    edited May 2017

    The best solution I found uses a python script called with its own WDL task. It is not particularly elegant.

    The WDL task operates on a Array[Array[String]], but for WDL reasons I do not completely understand, it is fine to pass it a Array[Array[File]]. The task outputs Array[String], which you can similarly interpret as Array[File]. Using String means that the task does not have to localize files.

    task collect_filenames{
        Array[Array[String]] filename_arrays
        File python_flatten
            echo "${sep='\n' filename_arrays}" > raw_array
            python ${python_flatten} < raw_array > file_of_filenames
            Array[String] filenames = read_lines("./file_of_filenames")

    You call this task something like this:

    Array[Array[File]] example
    call collect_filenames{ input:
        filename_arrays = example

    My flatten.py, which assumes filenames do not contain [, ], or ,:

    # This flattens a WDL Array[Array[String]] to a Array[String]
    # WDL does not have a built-in way to do this easily, so we do this in python
    # The elements of each array are assumed to be non-empty filenames.
    # Filenames should NOT include "[", "]", or ", "
    #sample input:
    # [/path/to/A1, /path/to/A2]
    # [/path/to/B1, /path/to/B2]
    import sys
    import re
    for line in sys.stdin:
        s = line
        # remove characters [ ], and remove newlines following, if any
        s = re.sub("([\[\]])(\n)?", "", s)
        # elements in array are separated by ", "
        s = s.replace(", ", "\n")
        print (s)
    Post edited by mmah on
  • nessus42nessus42 Member
  • RuchiRuchi admin Member, Broadie, Moderator, Dev admin

    Agreed with Chris that flatten should truly be a function implemented by the workflow engine.
    Another way that one could flatten Array[Array[_]] (assuming all the inner array's have the same length), could be something like this:

    workflow flatten {
    Array[Array[String]] myArray = [["a","b","c"],
    Array[Int] outerArrayLen = range(length(myArray))
    Array[Int] innerArrayLen = range(length(myArray[0]))
    Array[Pair[Int,Int]] coordinates = cross(outerArrayLen, innerArrayLen)
            scatter(i in coordinates) {
                String element = myArray[i.left][i.right]
            output {
              Array[String] flattened = element
  • Flatten() should be included. I like all the workarounds here, but have an even shorter one here:

    Disadvantage: Strings may not contain [,],, or ". But for filenames this is quite okay.

    task flattenStringArray {
        Array[Array[String]] arrayList
        command {
        for line in $(echo ${sep=', ' arrayList}) ; \
        do echo $line | tr -d '"[],' ; done
        output {
            Array[String] flattenedArray = read_lines(stdout())
Sign In or Register to comment.