What are quotas and how do I request more?
What are quotas?
In the Google Cloud Platform (GCP), quotas specify how many resources, such as central processing units (CPUs) and Persistent Disks (PDs), can be used by a Google Project at any given time. One reason for quotas is to ensure that there are not any unforeseen spikes in usage and that resources are available to the community at all times.
Why do quotas matter in FireCloud?
In FireCloud, Method Tasks define a tool used for analysis as well as the number of CPUs and Persistent Disks (PDs) that are required to compute the results. Methods are run from within Workspaces, and each Workspace gets paid for by a FireCloud Billing Project.
That FireCloud Billing Project is a Google Project, which Google enforces default quotas for compute engine resources based on a user's billing reputation. Finally, the Google Project is tied to a Google Billing Account, which will be charged for any data storage and compute costs incurred. Therefore, all Methods being run from a Workspace are affected by quotas on the Google Project.
When would I need to change quotas?
You may experience one of the following situations after launching an analysis if there is not enough quota:
1. Tasks within your Method will wait on quota availability. For example, you requested 1000 Tasks with 8 CPUs each and your quotas allow 24 CPUs at once, meaning you can only run 3 Tasks at a time. Each subsequent Task is effectively queued.
2. A Task within your Method fails because it requested more resources than allowed by your Google Project quotas. For example, you requested 60 CPUs in your Task and your quota is capped at 24 CPUs at once.
If you notice your analysis running more slowly than expected or see errors/messages related to quota in your logs, then you may want to request more. Please note that unless you are seeing errors, you do not need to update quotas - your analysis will simply run more slowly.
How do I check my quotas?
If you are a FireCloud Billing Project Owner you will be able to check your Google Project’s quota at a URL like this: https://console.cloud.google.com/iam-admin/quotas?project= the name of your FireCloud project
From this URL, you will see a long list of quotas for the given project. You can see from this example that the CPUs quota in region us-central1 is maxed out (the orange bar near 100%). FireCloud users need to request more Quota as defined below.
Note that quotas are defined per region, and therefore if you want to run your analysis across multiple regions (e.g. us-east1 and us-central1) then you will need to request larger quota in both.
FireCloud cares about the following Google Compute Engine API quotas:
* Google Compute Engine API -- CPUs: this defines how many CPUs you can use at once across all your Tasks.
* Google Compute Engine API -- Preemptible CPUs: this defines the pool of CPUs that would only be used by preemptible instances. You can learn more about this quota here and about preemptible instances here.
* Google Compute Engine API -- Persistent Disk Standard(GB): this defines how much total disk non-SSD you can have attached at once to your Task VMs.
* Google Compute Engine API -- Persistent Disk SSD(GB): this defines how much total SSD disk you can have attached at once to your Task VMs.
* Google Compute Engine API -- Local SSD(GB): this defines how much SSD is attached directly to the server running the Task VMs. You can learn more here. Only applicable if you are using local SSD in your Task.
How much quota will I need?
The amount of quota that is needed is a function of the number of Workflows being launched, the number of concurrent Tasks running within each Workflow, and the resources being requested by those Tasks.
In order to calculate the amount of quota needed for the workflows, you need to do a bit of diving into your WDL to examine what it is doing.
For example, let’s say we have a three Task WDL that will run on one to many samples. We need to look across these Tasks to determine what the maximum amount of CPU and PD we expect to need at any given time:
Task 1: uses 10 CPUs and 10GB of PD
Task 2: uses 1 CPU, 1GB of PD and scatters 10-ways wide
Task 3: uses 10 CPU, 10GB of PD and scatters 10-ways wide
In this example above, Task 1 and 2 are using the same amount of resources because Task 2 scatters. Task 3 however uses more resources than Task 1 or 2.
Let’s say we are running Task 3 on 10 samples at once. Task 3 requests a total of 100 CPUs and 100GB of PD (due to scattering 10-ways wide) for one sample. Because we are running this on 10 samples, we are trying to use 10 times those resources at once equalling 1000 CPUs and 1TB of PD. If our current quota is set at 24 CPUs and 100GB of PD and we want this workflow to run as quickly as possible, we will need to make a request for at least 1000 CPUs and 1TB of PD.
How do I request more quota?
You can make quota requests by sending an email to fc-quota-requests @ googlegroups.com. Please be sure to provide the following information:
1. How many CPUs you will need at any given time
2. How much Persistent Disk you will need at any given time
3. How much (if any) SSD Persistent Disk you will need at any given time