Attention: Want an end-to-end pipelining solution for GATK Best Practices?


Check out Terra here! For more details on whether this is the right fit for you checkout our blogs here.

Can I hold execution of one task until a previous task is completed

rvschendelrvschendel Member

I wrote two workflows and now I am trying to call them from a third workflow. I used the import statements and all went fine. However, during execution the second task is executed before the first task (running on SGE), causing problems because they are interdependent. However the imported workflows have no output and so they might be considered as independent. Is task order not preserved when executing from imported workflows?

Post edited by rvschendel on

Best Answer

Answers

  • rvschendelrvschendel Member

    I just found out it is not related to the import statements as a merged .wdl file does exactly the same. It seems that tasks that do not depend on each others output are scheduled directly for execution. However I am trying to achieve this:

    mapAndSortFastqfiles (scatter)

    mergeBamFilesWithRelatedFiles (scatter)

    however the second task does a find to locate the bam files. Is there an elegant way I can hold execution until the first task is completed? One obvious option is to use the output of mapAndSortFastqfiles, but the second scatter is done bases on a text file with sampleNames.

  • rvschendelrvschendel Member
    Accepted Answer

    Never mind, I just added some input from the first scatter to the second to prevent the second from executing untill all tasks from scatter one are done

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    hey @rvschendel it looks to me like you're relying on side effects here (ie that the first task does something outside of what is declared in the WDL, and the second task depends on that undeclared side-effect).

    That's bad because it goes against the expected execution model: (a) Cromwell can't track dependencies if they aren't declared, as you already found out, and (b) most cloud environments require you to specify exactly which files to download before running the task. If you have files that "must exist but aren't declared in the WDL" then you might run into "file not found" problems.

    My suggestion, if this is possible, is to make the output of your first task (including any newly made files) the input to the second task, rather than using the same previous input for both of the tasks.

  • hisplanhisplan New YorkMember
    RE: "to make the output of your first task the input to the second task"

    Is there any better/cleaner way to do this in WDL? Sometimes, I'd like to make the second task run *after* the first task is completed, but not necessarily I need any output of the first task to be fed into the second task (e.g. the first task is not generating any files, but just exits with 0 if successful).

    Just to make this possible with the current WDL, I would have to make a dummy output in the first task and feed that into the second task. Besides the fabrication, it would make code less readable as well...
Sign In or Register to comment.