Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Latest Release: 12/4/18
Release Notes can be found here.

Why am I seeing no evidence of preemption?

birgerbirger Member, Broadie, CGA-mod ✭✭✭

I am running tests using a workflow (somatic mutation calling) which includes the 25-way scattering of a mutect1 job. The workflow is configured to run those scatter jobs on preemptible VMs. We have run four analysis submissions, with each submission running the workflow on 1000 tumor/normal pairs. That has resulted in 4 X 1000 X 25 = 100,000 JES jobs run on preemptible machines (probably closer to 99,000 because for each analysis submission approximately 10 workflows failed). Each of these jobs run between 30 minutes and an hour.

I then used the /api/workspaces/{workspaceNamespace}/{workspaceName}/submissions/{submissionId}/workflows/{workflowId} endpoint to get call-level metadata which, according to the Cromwell team (discussed during office hours), should include information on preemption status. The call-level metadata includes an 'attempt' attribute. For all 100,000 jobs examined, the reported attempt attribute value was '1', with an execution status of 'Done', return code of '0'. I have attached a file containing a sample of the call level metadata.

Is it likely that I am seeing no evidence of preemption in close to 100,000 calls?
Am I looking for preemption data in the wrong place?
Is Cromwell not accurately reporting preemption data?

I have asked @esalinas to look for evidence of preemption in these four submissions by directly querying google JES for call-level metadata using the gcloud alpha genomics operations list and gcloud alpha genomics operations describe commands.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @birger It's very hard to say because AFAIK we don't have a reliable estimate of how often jobs are likely to be preempted -- but I'll try to find out whether we have preemption stats from the production pipeline. In general the principle is that the longer the run, the more likely it is to get preempted. Even though you were running many jobs, they were all reasonably short. In addition, it depends on the load on the Google cloud -- you might have been running your jobs at a time when load was light and there were machines available for everyone. So it does not seem impossible that you were lucky and did not get any preemptions -- but we'll check to make sure.

  • birgerbirger Member, Broadie, CGA-mod ✭✭✭

    I would expect that with close to 100,000 jobs, I would see at least one preemption. This is why I'd like to know if there is something wrong with method I'm using to get preemption stats, or if cromwell is not correctly reporting preemption stats. Hopefully, the data analysis I've asked @esalinas to do (looking at preemption data directly from google rather than going through cromwell) will shed some light on this. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ok, let us know what you find. FYI, I posted in your other thread a snippet of the call metadata from a workflow that was preempted (in the production pipeline), showing the relevant expected log items that indicate preemption.

Sign In or Register to comment.