Preemptible tasks that take more than 24 hours should not restart on a preemptible instance

jgouldjgould GouldMember ✭✭

We have encountered cases where a preemptible task takes longer than 24 hours and is therefore killed. However it is restarted again on a preemptible instance where it is killed. This process is repeated for the value of preemptible in the runtime section of the task WDL. We want to continue to use preemptible for this task, since 99% of the time with our input, the task will finish in less than 24 hours. Thanks.

Answers

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @jgould
    I'm not sure I understand correctly. In the title of your post you mention that you do not want a failed task on a preemptible to be retried on a preemptible. However, in the body of the post you mention that you want to continue using only preemptibles for the task ("We want to continue to use preemptible for this task, since 99% of the time with our input, the task will finish in less than 24 hours"). Can you confirm which method you would like to use and what your question is more clearly - in case I am misunderstanding :) Thanks!

  • jgouldjgould GouldMember ✭✭

    Sorry for not being clear. We want to use preemptible instances. The issue arises when we have an unexpected input file that causes a preemptible task to take longer than 24 hours causing the task to be killed. We now know that this task with this input can not use a preemptible instance since it can not finish within 24 hours, and so it should not restart on a preemptible. However, the same task that is preempted after 10 minutes for example should restart again on a preemptible instance. Does this make sense? Thanks.

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @jgould

    Ah yes I think I've got it. You would like for this task to run on a preemptible based on the amount of time it takes to preempt instead of by the identity of the task. For example, if the task fails due to the 24 hour mark, you want it to re-run on a non-preemptible but if it fails before that time limit, you would like for it to be retried on another preemptible. Just double checking!

  • jgouldjgould GouldMember ✭✭

    Yes, but this rule should apply to all tasks-it doesn't make sense to retry a task on a preemptible instance if you know it will take more than 24 hours since it is guaranteed to fail. Thanks.

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin
    edited February 14

    @jgould
    I spoke to our cromwell team and they explained that there is no logic currently that can re-run a task automatically from a preemptible retry to a non-preemptible since it is not aware of the reason (like too large of an input). However, as a workaround, if you have a general set of guidelines on which inputs (based on x GB for example) tend to cause preemptible failure, you can set up a system that pushes those inputs to non-preemptible machines immediately.

Sign In or Register to comment.