Task to Simply Collect Output Files

I am wondering if there is a way to specify a task which would simply place output files from previous tasks into the new task's output directory.

For example, suppose I have several tasks which produce output files. Now I want to have a task called "final" (which of course produces a "call-final' directory under the workflow output directory), and places all of the output files from the other tasks into that directory.

I am wanting this, so that I have one location to always retrieve the final results I want to archive, instead of having to jump around to the different task directories (which could be named totally differently depending on the workflow) to get the files I want to archive.

Thanks,
Erich

Tagged:

Answers

  • I think something naive like the following might work (still wondering if some more elegant solution exists):

    task task_output_1 {
    ...
    ...
    output {
    File output_1 = "output_1.txt"
    }
    }

    task task_output_2 {
    ...
    ...
    output {
    File output_2 = "output_2.txt"
    }
    }

    task final {
    File output_1
    File output_2
    ...
    ...
    command {
    cp ${output_1} .
    cp ${output_2} .
    }
    output {
    File final_output_1 = "${output_1}
    File final_output_2 = "${output_2}
    }
    }

    workflow test {
    call task_output_1 {
    ...
    }
    call task_output_2 {
    ...
    }
    call final {
    input: output_1 = task_output_1.output_1, output_2 = task_output_2.output_2
    }
    }

  • EADGEADG KielMember

    Yep this might work, but I think it will go fiddly if you have a lot of output-files. A general outputfolder which is defined in inputs/options.json would be nice. Maybe we get one for christmas ;)=

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    You can define a general output folder using an options JSON. However, this feature is currently not fully implemented, and has only been successfully used with our Google cloud backend. We use this to mark certain workflow outputs to keep at the end of our pipeline. See the excerpt from our published pipeline below:

    workflow PairedEndSingleSampleWorkflow {
      { ... }
    
      # Outputs that will be retained when execution is complete
    
      output {
        MarkDuplicates.duplicate_metrics
        GatherBqsrReports.*
        ConvertToCram.*
        GatherVCFs.*
        }
    }
    

    For now, though, your best option is the task script solution @erichpeterson mentioned. Alternatively, @conradL has written a solution for copying a workflow output to a directory outside of the Cromwell working directory, detailed here. Both require writing a specific task to collect the outputs and copy them to a certain place, however.

Sign In or Register to comment.