To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Is there a way to flatten arrays?

Let's say that I have an Array[Array[File]]. Is there a way to flatten this so that I have just an Array[File]?

Thanks!

Best Answer

Answers

  • mmahmmah Member, Broadie
    edited May 2017

    The best solution I found uses a python script called with its own WDL task. It is not particularly elegant.

    The WDL task operates on a Array[Array[String]], but for WDL reasons I do not completely understand, it is fine to pass it a Array[Array[File]]. The task outputs Array[String], which you can similarly interpret as Array[File]. Using String means that the task does not have to localize files.

    task collect_filenames{
        Array[Array[String]] filename_arrays
        File python_flatten
    
        command{
            echo "${sep='\n' filename_arrays}" > raw_array
            python ${python_flatten} < raw_array > file_of_filenames
        }
        output{
            Array[String] filenames = read_lines("./file_of_filenames")
        }
    }
    

    You call this task something like this:

    Array[Array[File]] example
    
    call collect_filenames{ input:
        filename_arrays = example
    }
    

    My flatten.py, which assumes filenames do not contain [, ], or , :

    # This flattens a WDL Array[Array[String]] to a Array[String]
    # WDL does not have a built-in way to do this easily, so we do this in python
    # The elements of each array are assumed to be non-empty filenames.
    # Filenames should NOT include "[", "]", or ", "
    
    #sample input:
    # [/path/to/A1, /path/to/A2]
    # [/path/to/B1, /path/to/B2]
    
    import sys
    import re
    
    for line in sys.stdin:
        s = line
        # remove characters [ ], and remove newlines following, if any
        s = re.sub("([\[\]])(\n)?", "", s)
        # elements in array are separated by ", "
        s = s.replace(", ", "\n")
        print (s)
    
    Post edited by mmah on
  • RuchiRuchi Member, Broadie, Dev

    Agreed with Chris that flatten should truly be a function implemented by the workflow engine.
    Another way that one could flatten Array[Array[_]] (assuming all the inner array's have the same length), could be something like this:

    workflow flatten {
    
    Array[Array[String]] myArray = [["a","b","c"],
                            ["d","e","f"],
                            ["g","h","i"],
                            ["j","k","l"]]
    
    Array[Int] outerArrayLen = range(length(myArray))
    Array[Int] innerArrayLen = range(length(myArray[0]))
    
    Array[Pair[Int,Int]] coordinates = cross(outerArrayLen, innerArrayLen)
    
            scatter(i in coordinates) {
                String element = myArray[i.left][i.right]
            } 
    
            output {
              Array[String] flattened = element
            }
    }
    
  • Flatten() should be included. I like all the workarounds here, but have an even shorter one here:

    Disadvantage: Strings may not contain [,],, or ". But for filenames this is quite okay.

    task flattenStringArray {
        Array[Array[String]] arrayList
        command {
        for line in $(echo ${sep=', ' arrayList}) ; \
        do echo $line | tr -d '"[],' ; done
        }
        output {
            Array[String] flattenedArray = read_lines(stdout())
        }
    }
    
Sign In or Register to comment.