Is there a way to flatten arrays?

Let's say that I have an Array[Array[File]]. Is there a way to flatten this so that I have just an Array[File]?

Thanks!

Best Answer

Answers

  • mmahmmah Member, Broadie
    edited May 2017

    The best solution I found uses a python script called with its own WDL task. It is not particularly elegant.

    The WDL task operates on a Array[Array[String]], but for WDL reasons I do not completely understand, it is fine to pass it a Array[Array[File]]. The task outputs Array[String], which you can similarly interpret as Array[File]. Using String means that the task does not have to localize files.

    task collect_filenames{
        Array[Array[String]] filename_arrays
        File python_flatten
    
        command{
            echo "${sep='\n' filename_arrays}" > raw_array
            python ${python_flatten} < raw_array > file_of_filenames
        }
        output{
            Array[String] filenames = read_lines("./file_of_filenames")
        }
    }
    

    You call this task something like this:

    Array[Array[File]] example
    
    call collect_filenames{ input:
        filename_arrays = example
    }
    

    My flatten.py, which assumes filenames do not contain [, ], or , :

    # This flattens a WDL Array[Array[String]] to a Array[String]
    # WDL does not have a built-in way to do this easily, so we do this in python
    # The elements of each array are assumed to be non-empty filenames.
    # Filenames should NOT include "[", "]", or ", "
    
    #sample input:
    # [/path/to/A1, /path/to/A2]
    # [/path/to/B1, /path/to/B2]
    
    import sys
    import re
    
    for line in sys.stdin:
        s = line
        # remove characters [ ], and remove newlines following, if any
        s = re.sub("([\[\]])(\n)?", "", s)
        # elements in array are separated by ", "
        s = s.replace(", ", "\n")
        print (s)
    
    Post edited by mmah on
  • RuchiRuchi Member, Broadie, Moderator, Dev

    Agreed with Chris that flatten should truly be a function implemented by the workflow engine.
    Another way that one could flatten Array[Array[_]] (assuming all the inner array's have the same length), could be something like this:

    workflow flatten {
    
    Array[Array[String]] myArray = [["a","b","c"],
                            ["d","e","f"],
                            ["g","h","i"],
                            ["j","k","l"]]
    
    Array[Int] outerArrayLen = range(length(myArray))
    Array[Int] innerArrayLen = range(length(myArray[0]))
    
    Array[Pair[Int,Int]] coordinates = cross(outerArrayLen, innerArrayLen)
    
            scatter(i in coordinates) {
                String element = myArray[i.left][i.right]
            } 
    
            output {
              Array[String] flattened = element
            }
    }
    
  • Flatten() should be included. I like all the workarounds here, but have an even shorter one here:

    Disadvantage: Strings may not contain [,],, or ". But for filenames this is quite okay.

    task flattenStringArray {
        Array[Array[String]] arrayList
        command {
        for line in $(echo ${sep=', ' arrayList}) ; \
        do echo $line | tr -d '"[],' ; done
        }
        output {
            Array[String] flattenedArray = read_lines(stdout())
        }
    }
    
Sign In or Register to comment.