Can I hold execution of one task until a previous task is completed

rvschendelrvschendel Member

I wrote two workflows and now I am trying to call them from a third workflow. I used the import statements and all went fine. However, during execution the second task is executed before the first task (running on SGE), causing problems because they are interdependent. However the imported workflows have no output and so they might be considered as independent. Is task order not preserved when executing from imported workflows?

Post edited by rvschendel on

Best Answer


  • rvschendelrvschendel Member

    I just found out it is not related to the import statements as a merged .wdl file does exactly the same. It seems that tasks that do not depend on each others output are scheduled directly for execution. However I am trying to achieve this:

    mapAndSortFastqfiles (scatter)

    mergeBamFilesWithRelatedFiles (scatter)

    however the second task does a find to locate the bam files. Is there an elegant way I can hold execution until the first task is completed? One obvious option is to use the output of mapAndSortFastqfiles, but the second scatter is done bases on a text file with sampleNames.

  • rvschendelrvschendel Member
    Accepted Answer

    Never mind, I just added some input from the first scatter to the second to prevent the second from executing untill all tasks from scatter one are done

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    hey @rvschendel it looks to me like you're relying on side effects here (ie that the first task does something outside of what is declared in the WDL, and the second task depends on that undeclared side-effect).

    That's bad because it goes against the expected execution model: (a) Cromwell can't track dependencies if they aren't declared, as you already found out, and (b) most cloud environments require you to specify exactly which files to download before running the task. If you have files that "must exist but aren't declared in the WDL" then you might run into "file not found" problems.

    My suggestion, if this is possible, is to make the output of your first task (including any newly made files) the input to the second task, rather than using the same previous input for both of the tasks.

Sign In or Register to comment.