read_tsv skips empty lines

I created a test WDL that uses the function read_tsv:

task readTsv {

    command {
        echo -e “gs://A\tgs://B" > files.dat
        echo -e "" >> files.dat
        echo -e "gs://C" >> files.dat
    }

    output {
        Array[Array[String]] fileList = read_tsv("files.dat")
    }

    runtime {
        docker: "skashin/genome-strip:dev"
    }
}

task processFiles {
    Array[File] files

    command {
        echo ${sep=" " files} | sed 's/ /\n/g' > files.list
        scripts/process_files.sh files.list
    }

    output {
        File out = "files.list"
    }

    runtime {
        docker: "skashin/genome-strip:dev"
    }
}

workflow read_tsv_wf {

    call readTsv

    scatter (files in readTsv.fileList) {
        call processFiles {
            input:
                files = files
        }
    }

    output {
        Array[File] outputList = processFiles.out
    }
}

The task readTsv returns [["gs://A", "gs://B"], ["gs://C”]], while I would expect it to return [["gs://A", "gs://B"], [], ["gs://C”]]

My problem is that I need to be able to scatter over the Array of Array[String], including the empty one.

Is it really the expected behavior for read_tsv(), and is there a way to do it the way I'd like it to work?

Thanks

Answers

  • kshakirkshakir Broadie, Dev ✭✭

    The WDL spec for draft-2 is closed, and doesn't explicitly state whether newlines should be stripped during read_tsv. I encourage you to submit a proposal if you'd like for the next specification of WDL to make the behavior explicit such that read_tsv respects blank lines in the middle of a file. In my opinion you should also discuss how you believe read_tsv should treat the final newline or newlines.

    You may also be able to work around your issues using a future version Cromwell with either read_json or read_object. These draft-2 features were recently added to Cromwell and will be available when version 31 is released.

Sign In or Register to comment.