Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Inconsistency in file paths when pipelining tasks

Hello,
I am attempting to pass output of a task call to a next task in my pipeline and I have been struggling with getting the input paths right.
My .wdl looks like this:

task pool_and_pseudoreplicate_complex {
    File tags_rep1
    File tags_rep2
    File tags_ctrl1
    File tags_ctrl2
    String rep1_paired_end
    String rep2_paired_end

    command {
        python /image_software/pipeline-container/src/pool_and_pseudoreplicate.py ${tags_rep1} ${tags_ctrl1} ${rep1_paired_end} ${tags_rep2} ${tags_ctrl2} ${rep2_paired_end}
    }

    output {
        Array[File] out_files = glob('*.gz')
        File results = glob('pool_and_pseudoreplicate_outfiles.mapping')[0]
    }

    runtime {
        docker: 'quay.io/ottojolanki/pool_and_pseudoreplicate:v1.11'
        cpu: '1'
        memory: '4.0 GB'
        disks: 'local-disk 30 HDD'
    }
}

task pool_and_pseudoreplicate_simple {
    File tags_rep1
    File tags_ctrl1
    String rep1_paired_end

    command {
        python /image_software/pipeline-container/src/pool_and_pseudoreplicate.py ${tags_rep1} ${tags_ctrl1} ${rep1_paired_end}
    }

    output {
        File rep1_pr1 = glob('*pr1.tagAlign.gz')[0]
        File rep1_pr2 = glob('*pr2.tagAlign.gz')[0]
        File results = glob('pool_and_pseudoreplicate_outfiles.mapping')[0]
    }

    runtime {
        docker: 'quay.io/ottojolanki/pool_and_pseudoreplicate:v1.11'
        cpu: '1'
        memory: '4.0 GB'
        disks: 'local-disk 30 HDD'
    }
}

task xcor {
    File tags
    String paired_end

    command {
        python /image_software/pipeline-container/src/xcor_only.py ${tags} ${paired_end}
    }

    output {
        File xcor_scores = glob('*.cc.qc')[0]
        File xcor_plot = glob('*.cc.plot.pdf')[0]

    }

    runtime {
        docker: 'quay.io/ottojolanki/xcor_only:test3'
        cpu: '1'
        memory: '4.0GB'
        disks: 'local-disk 30 HDD'
    }
}

task output_defined {
    File is_this_def
    File is_this_def2
    String paired_end

    command {
        echo "the input is defined!"
        echo ${is_this_def}
        echo ${is_this_def2}
        echo ${paired_end}
    }

    runtime {
        docker: 'ubuntu:latest'
        cpu: '1'
        memory: '4.0GB'
        disks: 'local-disk 30 HDD'
    }
}

#WORKFLOW DEFINITION

workflow pool_and_pseudoreplicate_workflow {
    File tags_rep1
    File? tags_rep2
    File tags_ctrl1
    File? tags_ctrl2
    String rep1_paired_end
    String? rep2_paired_end
    #String genomesize
    #File chrom_sizes
    #File narrowpeak_as
    #File gappedpeak_as
    #File broadpeak_as

    if(defined(tags_rep2)){
        call pool_and_pseudoreplicate_complex {
            input:  tags_rep1=tags_rep1,
                    tags_rep2=tags_rep2,
                    tags_ctrl1=tags_ctrl1,
                    tags_ctrl2=tags_ctrl2,
                    rep1_paired_end=rep1_paired_end,
                    rep2_paired_end=rep2_paired_end
        }
    }
    if(!defined(tags_rep2)){
        call pool_and_pseudoreplicate_simple {
            input:  tags_rep1=tags_rep1,
                    tags_ctrl1=tags_ctrl1,
                    rep1_paired_end=rep1_paired_end
        }
        call output_defined {
            input: is_this_def=pool_and_pseudoreplicate_simple.rep1_pr1,
                   is_this_def2=pool_and_pseudoreplicate_simple.rep1_pr2, 
                   paired_end=rep1_paired_end
                  }
        call xcor {
            input:  tags = pool_and_pseudoreplicate_simple.rep1_pr1,
                    paired_end = rep1_paired_end
        }
    }

}

And my inputs .json is:

{
    "pool_and_pseudoreplicate_workflow.tags_rep1": "rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.tagAlign.gz",
    "pool_and_pseudoreplicate_workflow.tags_ctrl1": "ctl1_chr21.raw.srt.filt.srt.nodup.PE2SE.tagAlign.gz",
    "pool_and_pseudoreplicate_workflow.rep1_paired_end": "False"
}

I run cromwell 28 in the working directory

/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/

where I have both the .wdl .json and the input files that are considered.

The first task does some simple subsampling, and outputs the result files. It works correctly. I was having trouble getting the subsequent xcor task to work, and thus for debugging purposes added the task to check that the output actually can be passed along. When I run my workflow:

DN0a22f0dd:pool_and_pseudoreplicate_test_data otto$ java -jar cromwell-28_2.jar run pool_and_pseudoreplicate_workflow.wdl rep_inputs_simple.json

in addition to other output there are some lines that confuse me. The commands run in output_defined are(correctly):

[2017-08-25 10:38:34,06] [info] BackgroundConfigAsyncJobExecutionActor [aa3ba195pool_and_pseudoreplicate_workflow.output_defined:NA:1]: echo "the input is defined!"
echo /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-output_defined/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-pool_and_pseudoreplicate_simple/execution/glob-aefc71437f2745efd61690b3747de0b1/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz
echo /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-output_defined/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-pool_and_pseudoreplicate_simple/execution/glob-68b15494bd1ce67f1f051918d3136843/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr2.tagAlign.gz
echo False

while the input to the xcor task call seems to get cut for some reason:

python /image_software/pipeline-container/src/xcor_only.py /cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-xcor/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-pool_and_pseudoreplicate_simple/execution/glob-aefc71437f2745efd61690b3747de0b1/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz False

Do any of you have idea why the path of the input files are complete in the first call, and in the second call handled as if the cromwell-executions directory were directly in the root of the filesystem?

Answers

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    File paths in Dockerized tasks should begin with /cromwell-executions, that's the appropriate view of the file layout from inside the container.

    That path in xcor is some bizarre and very wrong chimera. I suspect something is going wrong with the glob, I'll try to work up a minimal test case for that.

  • Thanks so much for looking at this. My head is getting quite sore from hitting the wall again, and again.

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    And actually I think I may have been wrong about that chimeric path, that looks to be an intentional duplication of the input directory structure to prevent inputs with the same filenames from colliding within the container.

    What exactly is the problem you're seeing with xcor?

  • The full output from the run looks like this.

    DN0a22f0dd:pool_and_pseudoreplicate_test_data otto$ java -jar cromwell-28_2.jar run pool_and_pseudoreplicate_workflow.wdl rep_inputs_simple.json 
    [2017-08-25 17:16:23,31] [info] Slf4jLogger started
    [2017-08-25 17:16:23,38] [info] RUN sub-command
    [2017-08-25 17:16:23,38] [info]   WDL file: /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/pool_and_pseudoreplicate_workflow.wdl
    [2017-08-25 17:16:23,38] [info]   Inputs: /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/rep_inputs_simple.json
    [2017-08-25 17:16:23,41] [info] SingleWorkflowRunnerActor: Submitting workflow
    [2017-08-25 17:16:23,60] [info] Running with database db.url = jdbc:hsqldb:mem:776be12e-3178-42ab-a640-2eef264d03b6;shutdown=false;hsqldb.tx=mvcc
    [2017-08-25 17:16:27,98] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
    [2017-08-25 17:16:27,99] [info] [RenameWorkflowOptionsInMetadata] 100%
    [2017-08-25 17:16:28,04] [info] Metadata summary refreshing every 2 seconds.
    [2017-08-25 17:16:28,07] [info] SingleWorkflowRunnerActor: Workflow submitted 8e652bf4-3a13-420a-a50a-1dd62c65489d
    [2017-08-25 17:16:28,07] [info] Workflow 8e652bf4-3a13-420a-a50a-1dd62c65489d submitted.
    [2017-08-25 17:16:28,50] [info] 1 new workflows fetched
    [2017-08-25 17:16:28,51] [info] WorkflowManagerActor Starting workflow 8e652bf4-3a13-420a-a50a-1dd62c65489d
    [2017-08-25 17:16:28,51] [info] WorkflowManagerActor Successfully started WorkflowActor-8e652bf4-3a13-420a-a50a-1dd62c65489d
    [2017-08-25 17:16:28,51] [info] Retrieved 1 workflows from the WorkflowStoreActor
    [2017-08-25 17:16:28,69] [info] MaterializeWorkflowDescriptorActor [8e652bf4]: Call-to-Backend assignments: pool_and_pseudoreplicate_workflow.xcor -> Local, pool_and_pseudoreplicate_workflow.output_defined -> Local, pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_complex -> Local, pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_simple -> Local
    [2017-08-25 17:16:28,78] [warn] Local [8e652bf4]: Key/s [cpu, memory, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
    [2017-08-25 17:16:28,78] [warn] Local [8e652bf4]: Key/s [cpu, memory, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
    [2017-08-25 17:16:28,78] [warn] Local [8e652bf4]: Key/s [cpu, memory, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
    [2017-08-25 17:16:32,00] [info] WorkflowExecutionActor-8e652bf4-3a13-420a-a50a-1dd62c65489d [8e652bf4]: Starting calls: pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_complex:NA:1, pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_simple:NA:1
    [2017-08-25 17:16:32,68] [warn] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_simple:NA:1]: Unrecognized runtime attribute keys: disks, cpu, memory
    [2017-08-25 17:16:32,70] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_simple:NA:1]: python /image_software/pipeline-container/src/pool_and_pseudoreplicate.py /cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.tagAlign.gz /cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/ctl1_chr21.raw.srt.filt.srt.nodup.PE2SE.tagAlign.gz False
    [2017-08-25 17:16:32,72] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_simple:NA:1]: executing: docker run \
      --rm -i \
       \
      --entrypoint /bin/bash \
      -v /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple:/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple \
      quay.io/ottojolanki/[email protected]:27a8cb20028cc2b0d700a2c8eee544029fec44aa6dc19cbf94eb749084fe8e6f /cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple/execution/script
    [2017-08-25 17:16:32,74] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_simple:NA:1]: job id: 9651
    [2017-08-25 17:16:32,75] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_simple:NA:1]: Status change from - to WaitingForReturnCodeFile
    [2017-08-25 17:16:39,55] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.pool_and_pseudoreplicate_simple:NA:1]: Status change from WaitingForReturnCodeFile to Done
    [2017-08-25 17:16:40,16] [info] WorkflowExecutionActor-8e652bf4-3a13-420a-a50a-1dd62c65489d [8e652bf4]: Starting calls: pool_and_pseudoreplicate_workflow.output_defined:NA:1, pool_and_pseudoreplicate_workflow.xcor:NA:1
    [2017-08-25 17:16:40,18] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.output_defined:NA:1]: echo "the input is defined!"
    echo /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-output_defined/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple/execution/glob-aefc71437f2745efd61690b3747de0b1/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz
    echo /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-output_defined/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple/execution/glob-68b15494bd1ce67f1f051918d3136843/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr2.tagAlign.gz
    echo False
    [2017-08-25 17:16:40,18] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.output_defined:NA:1]: executing: /bin/bash /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-output_defined/execution/script
    [2017-08-25 17:16:40,19] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.output_defined:NA:1]: job id: 9659
    [2017-08-25 17:16:40,19] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.output_defined:NA:1]: Status change from - to WaitingForReturnCodeFile
    [2017-08-25 17:16:40,55] [warn] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.xcor:NA:1]: Unrecognized runtime attribute keys: disks, cpu, memory
    [2017-08-25 17:16:40,56] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.xcor:NA:1]: python /image_software/pipeline-container/src/xcor_only.py /cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple/execution/glob-aefc71437f2745efd61690b3747de0b1/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz False
    [2017-08-25 17:16:40,56] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.xcor:NA:1]: executing: docker run \
      --rm -i \
       \
      --entrypoint /bin/bash \
      -v /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor:/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor \
      quay.io/ottojolanki/[email protected]:d6d29ea8cad3d4be05b33c8897291dfb83707a94c8ea434f5f7a0a9ed7df79c9 /cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor/execution/script
    [2017-08-25 17:16:40,57] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.xcor:NA:1]: job id: 9669
    [2017-08-25 17:16:40,57] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.xcor:NA:1]: Status change from - to WaitingForReturnCodeFile
    [2017-08-25 17:16:41,26] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.output_defined:NA:1]: Status change from WaitingForReturnCodeFile to Done
    [2017-08-25 17:16:41,79] [info] BackgroundConfigAsyncJobExecutionActor [8e652bf4pool_and_pseudoreplicate_workflow.xcor:NA:1]: Status change from WaitingForReturnCodeFile to Done
    [2017-08-25 17:16:41,81] [error] WorkflowManagerActor Workflow 8e652bf4-3a13-420a-a50a-1dd62c65489d failed (during ExecutingWorkflowState): Job pool_and_pseudoreplicate_workflow.xcor:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor/execution/stderr
    [2017-08-25 17:16:41,81] [info] WorkflowManagerActor WorkflowActor-8e652bf4-3a13-420a-a50a-1dd62c65489d is in a terminal state: WorkflowFailedState
    [2017-08-25 17:16:42,20] [info] Message [cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$CheckRunnable$] from Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/WorkflowActor-8e652bf4-3a13-420a-a50a-1dd62c65489d/WorkflowExecutionActor-8e652bf4-3a13-420a-a50a-1dd62c65489d#1398378143] to Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/WorkflowActor-8e652bf4-3a13-420a-a50a-1dd62c65489d/WorkflowExecutionActor-8e652bf4-3a13-420a-a50a-1dd62c65489d#1398378143] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
    [2017-08-25 17:16:50,66] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
    Workflow 8e652bf4-3a13-420a-a50a-1dd62c65489d transitioned to state Failed
    

    stderr looks like this.

    DN0a22f0dd:pool_and_pseudoreplicate_test_data otto$ cat /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor/execution/stderr
    gzip: /cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple/execution/glob-aefc71437f2745efd61690b3747de0b1/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz has 3 other links -- unchanged
    grep: rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign: No such file or directory
    Loading required package: caTools
    Error in read.table(align.filename, nrows = 500) : 
      no lines available in input
    Calls: read.align -> read.table
    Execution halted
    sed: can't read rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.sample.15.SE.tagAlign.gz.cc.qc: No such file or directory
    Traceback (most recent call last):
      File "/image_software/pipeline-container/src/xcor_only.py", line 130, in <module>
        main(sys.argv[1], parse_true_or_false(sys.argv[2]))
      File "/image_software/pipeline-container/src/xcor_only.py", line 114, in main
        xcor_qc = xcor_parse(CC_scores_filename)
      File "/image_software/pipeline-container/src/xcor_only.py", line 26, in xcor_parse
        line = lines[0].rstrip('\n')
    IndexError: list index out of range
    ln: failed to access '*.cc.plot.pdf': No such file or directory
    

    my stdout:

    DN0a22f0dd:pool_and_pseudoreplicate_test_data otto$ cat /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor/execution/stdout
    first step shlex to stdout: ['gzip', '-d', '/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-xcor/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/8e652bf4-3a13-420a-a50a-1dd62c65489d/call-pool_and_pseudoreplicate_simple/execution/glob-aefc71437f2745efd61690b3747de0b1/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz']
    first step shlex to stdout: ['grep', '-v', 'chrM', 'rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign']
    intermediate step 2 shlex to stdout: ['shuf', '-n', '15000000', '--random-source=rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign']
    last step shlex: ['gzip', '-cn'] to file: rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.sample.15.SE.tagAlign.gz
    first step shlex to stdout: ['Rscript', '/image_software/phantompeakqualtools/run_spp.R', '-c=rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.sample.15.SE.tagAlign.gz', '-p=4', '-filtchr=chrM', '-savp=rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.sample.15.SE.tagAlign.gz.cc.plot.pdf', '-out=rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.sample.15.SE.tagAlign.gz.cc.qc']
    one step shlex: ['sed', '-r', 's/,[^\\t]+//g', 'rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.sample.15.SE.tagAlign.gz.cc.qc'] to file: temp
    first step shlex to stdout: ['mv', 'temp', 'rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.sample.15.SE.tagAlign.gz.cc.qc']
    

    The problem arises by some of the intermediate files getting lost. All of the code I have put in the container runs without problems on a standard ubuntu machine, so I think something with the filepaths go wrong somewhere along the pipe.
    The (in state of conversion from former architecture) source code and the Dockerfiles (I wholeheartedly understand that digging into other people's source may not be your preferred Friday afternoon activity) can be found in the repo here: https://github.com/ENCODE-DCC/pipeline-container/tree/xcor_and_macs_workflow Thank you again!

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    Hi @ojolanki, is that gz file supposed to get un-gzipped once it gets into xcor? The only file localized into the xcor container is called rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz. There are errors that say

    grep: rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign: No such file or directory
    

    and

    sed: can't read rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.sample.15.SE.tagAlign.gz.cc.qc: No such file or directory
    

    That latter one with suffixes appended to a gz file looks particularly questionable. :smile:

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    This might be working with non-Dockerized tasks since non-Dockerized tasks can see the files from other non-Dockerized tasks, but Dockerized tasks cannot.

  • That makes a lot of sense. I will investigate. Thanks for the help.

Sign In or Register to comment.