Preemptible instances: historical data of Google Compute Engine VM running time before preemption?

dshihdshih BostonMember, Broadie
edited October 2017 in Ask the FireCloud Team

Google makes zero guarantee on when your preemptible instance could become preempted.

Preemptible instances typically cost about 1/4 of regular instances, but it is difficult to make an informed decision on whether to request preemptible instances without any data to build a predictive model of time to preemption...

Question 1. Now that people have been running FireCloud or Google Cloud for a while... Is there any historical data available?
By default, Cromwell request a non-preemptible instance. Has anyone else been setting the runtime attribute preemptible to a non-zero number and collected data like below?

Question 2. Is there a programmatic way of extracting preemption data from your own FireCloud account?

A sample of the data that I am hoping to collect:

date start_time zone machine_type time_to_preempt prev_preempts
2017-10-17 12:33 us-central1-b n1-standard-8 39m+ 0
2017-10-16 19:17 us-central1-b n1-standard-2 26m 0
2017-10-16 19:44 us-central1-b n1-standard-2 1h11m 1
2017-10-16 20:55 us-central1-b n1-standard-2 4h38m+ 2
2017-10-16 12:55 us-central1-b n1-standard-8 5h43m+ 0
2017-10-17 12:34 us-central1-b n1-standard-2 5h2m+ 0
2017-10-17 12:39 us-central1-b n1-standard-2 3h14m+ 0
2017-10-17 15:53 us-central1-b n1-standard-2 16m 1
2017-10-17 16:10 us-central1-b n1-standard-2 5m 2
2017-10-17 14:41 us-central1-b n1-standard-2 1h45m+ 0

Note: Only the running time of preemptible instances were measured (non-preemptible instances were omitted). If a task completes/fails/aborts before preemption, the observation is censored (indicated by +).

Remark: As seen above, one of our instances was taken away after 5 minutes... No guarantee, sure. But come on, 5 minutes?! And that's after two previous preemptions!

With this type of data, we could build a predictive model to calculate the expectation of time to preemption and make an informed decision using decision theory to minimize expected compute cost.


  • dshihdshih BostonMember, Broadie


    Thanks! The gcloud compute operations list is very helpful. I can see the list, but I guess I don't have the sufficient privilege to gcloud compute operation describe a specific operation (all I get is a vague error message, 'The resource .../operations-xxxxx/' was not found.

    This command gives me a list of the start and preempted times of instances that were preempted.
    To solve the other piece of the puzzle, I would need to extract the start and end times of preemptible instances that were not preempted... Any suggestions?

    Also, I don't suppose you have collected any preemption data yourself?

    Thank you.

  • esalinasesalinas BroadMember, Broadie ✭✭✭
    edited October 2017


    you can ask @birger about collected information. He has.

    if you go to FC and find a task that ran (a real run, not a cache "hit") see an operations ID (which looks like operations/abc123) you can issue a command
    "gcloud alpha genomics operations describe operations/abc123" which will tell additional information. It will tell an instance name (like "ggp-def456") which I believe can be like a database key to "gcloud compute operations list"


  • dshihdshih BostonMember, Broadie

    Great! Thanks, @esalinas. The commands gcloud alpha genomics operations list and gcloud compute operations list give me all the information I need.

