Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Question about read_object function usage

Hello Cromwell/WDL folks:

I'm hoping for some clarification regarding the use of the `read_object` function. I get failures when I place the function inside a `task`, as I show below. If the call to the function is instead in the workflow block, then it works fine (also shown). However, that approach does not work for some dynamically generated files I create. I have found a hacky workaround, but I am also wondering if my primary (failing) example might be a bug? Thanks for any help/guidance you can provide.

Environment info:
Cromwell 36 using Java 10:
```
$ java -version
openjdk version "10.0.1" 2018-04-17
OpenJDK Runtime Environment (build 10.0.1+10-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 10.0.1+10-Ubuntu-3ubuntu1, mixed mode)
```

Here is a minimum-example:

Failing WDL:
```
workflow read_object_fail{
File infile

call taskA {
input:
myfile = infile,
outfile = "result.txt"
}

output {
File results = taskA.result
}
}

task taskA {
File myfile
String outfile
Object myobj = read_object(myfile)

command {
echo ${myobj.keyC} > "${outfile}"
}

output {
File result = "${outfile}"
}
}
```
with an inputs file:
```
{
"read_object_fail.infile": "readlines_input.tsv"
}
```
and an "object file", readlines_input.tsv:
```
keyA\tkeyB\tkeyC
val1\tval2\tval3
```

When I run this, I get the following error (IOException):
```
[2019-02-22 13:54:48,39] [error] WorkflowManagerActor Workflow ca1d5001-e11b-4f0b-849a-086353ca8f03 failed (during ExecutingWorkflowState): cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor$$anonfun$1$$anon$1: Call input and runtime attributes evaluation failed for taskA:
Failed to evaluate input 'myobj' (reason 1 of 1): [Attempted 1 time(s)] - IOException: Could not read from /home/brian/sandbox/wdl_forum_question/cromwell-executions/read_object_fail/ca1d5001-e11b-4f0b-849a-086353ca8f03/call-taskA/execution/readlines_input.tsv: /home/brian/sandbox/wdl_forum_question/cromwell-executions/read_object_fail/ca1d5001-e11b-4f0b-849a-086353ca8f03/call-taskA/execution/readlines_input.tsv
...
(more stacktrace)
...
```
Note that if I move the `read_object` function to the main workflow block (and pass that object to the task), it works. For example:

Working WDL:
```
workflow read_object_working{
File infile
Object myobj = read_object(infile)

call taskB {
input:
myobj = myobj,
outfile = "result.txt"
}

output {
File results = taskB.result
}
}

task taskB {
Object myobj
String outfile

command {
echo ${myobj.keyC} > "${outfile}"
}

output {
File result = "${outfile}"
}
}
```
While this works, it doesn't help me if the file I am reading was dynamically created (e.g. by an earlier task).

The hacky workaround is to create another dummy task where I "cat" the file and then use `read_object` with `stdout`:

Hack WDL:
```
workflow read_object_hack{

File infile

call cat_task {
input:
infile = infile
}

call taskB {
input:
myobj = cat_task.myobj,
outfile = "result.txt"
}

output {
File results = taskB.result
}
}

task taskB {
Object myobj
String outfile

command {
echo ${myobj.keyC} > "${outfile}"
}

output {
File result = "${outfile}"
}
}


task cat_task {
File infile

command {
cat ${infile}
}

output {
Object myobj = read_object(stdout())
}
}
```
This works, but seems unsatisfactory. I am running my workflows in the GCP environment, and that "dummy" `cat_task` requires another VM to spin up, etc. which is just wasteful. The only usage of read_object in the documentation shows it reading from `stdout` (e.g. from a python script that writes to stdout).

Am I missing something about the usage of this function?

Answers

Sign In or Register to comment.