Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Scatter generates RequestingExecutionToken on Timing Diagram

Hi,
When I set the concurrent-job-limit=1in my config file, scatter clause (like the example below), doesn't seem to work as I thought. From the timing diagram I could tell that after completion of the first task, the second task does not kick-in. Rather, I see a message 'RequestingExecutionToken' when I hover-over its time line.

Could you please tell me what the issue is, and if there's a way around it?

Many thanks,
Dorjee

Array[Int] indices = range(10) # Will produce [0,1,2,3,4,5,6,7,8,9]
scatter(i in indices) {
    call slight_delay as first_delay { input: delay = i }
    call slight_delay as second_delay { input: delay = first_delay.delay_plus_1}
}

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    Hi Dorjee - this might be working as expected.

    When one shard's first_delay task completes, Cromwell considers the second_delay task eligible to start, but it isn't necessarily the only task eligible to start - you might still have a bunch of first_delays still waiting too. Since you're limiting Cromwell to start one job at a time, when it has space it will pick one of the eligible tasks, and start it - and not necessarily in the same order that you would chose!

    The difference in time between a task being eligible to run and actually starting to run is called "RequestingExecutionToken" on the timing diagram. That's just Cromwell's way of saying "ready to start, but on-hold because I'm already at my concurrent job limit"

    So when I ran your example, with a concurrent limit of 1, I see the same thing. Dark green was "RequestingExecutionToken". And light blue was "RunningJob":

    Notice that the second_delays become ready to start as soon as the corresponding first_delay is complete, but also notice that there is only ever one job in "RunningJob" state at any one time.

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    Based on your comments in the OpenWDL thread, I think the problem you have is that you want to execute the tasks depth-first rather than breadth-first.

    If a graph looks like:
    A1 -> B1
    A2 -> B2
    A3 -> B3

    Then the partial order allows you to serialize those tasks as either:
    A1, A2, A3, B1, B2, B3, or
    A1, B1, A2, B2, A3, B3
    (there's actually more but these two would represent breadth-first and depth-first)

    WDL only specifies the graph, it's up to the engine to choose the order - and Cromwell tends to pick older tasks in preference to newer ones. So long as your tasks don't have side-effects unknown to the WDL, which order is chosen ideally shouldn't actually make much difference to you?

  • DorjeeDorjee Member

    Hi Chris,

    I really appreciate your detailed clarification on this.

    What I understand is (please correct me if I'm wrong) the actual order of task execution is up to the Cromwell engine, ie I can't tell the engine to always execute the tasks depth-first. But it tends to pick older tasks in preference to newer ones. So, my approach/example in the OpenWDL thread is a step in the right direction?

    Also, this may be a dumb one, what defines a task newer or older when calling from a WF?

    Thanks a lot,
    Dorjee

  • DorjeeDorjee Member

    I understand that when first_delay task completes, Cromwell considers the second_delay task eligible to start (ie, queued). Since, we have bunch of first_delays still waiting, I've noticed that none of the waiting second_delays get executed until all the first_delays are completed.

    Eg: timing diagram below shows that the second_delay.8 has just started running only after all the first_delay tasks have completed. I was expecting second_delay.8 start soon after the first_delay.8. Am I missing something here. Please, help.

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    By "oldest" I just mean "have been waiting the longest".

    So in your example, all of the first_delays go first because they are waiting since time 11 on your time scale, whereas second_delay:8 has been waiting only since time 18. Does that answer the question?

    Just for my information - is it important to your workflow that things have to happen A,B,A,B,A,B rather than A,A,A,B,B,B? Or are you just trying to understand Cromwell's scheduling?

  • DorjeeDorjee Member

    Thanks, Chris.

    Yes, it's kinda important that the workflow happens in A, B, A, B, A, B order because the idea is, as soon as expected results are generated with first A -> B, they can be immediately made available for review/interpretation (time sensitive data) while the remaining workflow tasks keep on going.

  • DorjeeDorjee Member

    I guess, I'm just curious to know if I could tell Cromwell to execute the tasks depth-first.
    Thanks,

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    Hey Dorjee -

    Right now there's no way to make Cromwell go depth first but I don't think there's any reason why it couldn't one day.

    In fact this has come up before in other contexts so you're definitely not the only person asking for it!

Sign In or Register to comment.