Forum Login Issue:
Currently the "Log in with Google" button redirects you to a "Page not found." This is an issue that our forum vendors are working on fixing. In the meantime, while on the "Page not found" you can edit the URL to delete the second gatk, firecloud, or wdl (depending on what subforum you are acessing).
ex: https://gatkforums.broadinstitute.org/gatk/gatk/entry/...

read_tsv skips empty lines

I created a test WDL that uses the function read_tsv:

task readTsv {

    command {
        echo -e “gs://A\tgs://B" > files.dat
        echo -e "" >> files.dat
        echo -e "gs://C" >> files.dat
    }

    output {
        Array[Array[String]] fileList = read_tsv("files.dat")
    }

    runtime {
        docker: "skashin/genome-strip:dev"
    }
}

task processFiles {
    Array[File] files

    command {
        echo ${sep=" " files} | sed 's/ /\n/g' > files.list
        scripts/process_files.sh files.list
    }

    output {
        File out = "files.list"
    }

    runtime {
        docker: "skashin/genome-strip:dev"
    }
}

workflow read_tsv_wf {

    call readTsv

    scatter (files in readTsv.fileList) {
        call processFiles {
            input:
                files = files
        }
    }

    output {
        Array[File] outputList = processFiles.out
    }
}

The task readTsv returns [["gs://A", "gs://B"], ["gs://C”]], while I would expect it to return [["gs://A", "gs://B"], [], ["gs://C”]]

My problem is that I need to be able to scatter over the Array of Array[String], including the empty one.

Is it really the expected behavior for read_tsv(), and is there a way to do it the way I'd like it to work?

Thanks

Answers

  • kshakirkshakir Broadie, Dev

    The WDL spec for draft-2 is closed, and doesn't explicitly state whether newlines should be stripped during read_tsv. I encourage you to submit a proposal if you'd like for the next specification of WDL to make the behavior explicit such that read_tsv respects blank lines in the middle of a file. In my opinion you should also discuss how you believe read_tsv should treat the final newline or newlines.

    You may also be able to work around your issues using a future version Cromwell with either read_json or read_object. These draft-2 features were recently added to Cromwell and will be available when version 31 is released.

Sign In or Register to comment.