Workflow failing due to "Quota CPUS exceeded in region us-central1"

aryeearyee Member, Broadie

Several of my workflows have failed recently with "quota exceeded" errors. See for example the workflow with id 1c243834-ca4c-4e3e-a940-2ae3c1f9f57e that fails with this message:

message: Task preprocess_hic.hicpro_align:NA:1 failed. JES error code 9. Message: Quota CPUS exceeded in region us-central1

I had requested a 32 core machine, and am currently running no other workflows.

Does anyone know why I seem to have hit a quota limit with this small request?
Thanks.

Best Answer

Answers

  • esalinasesalinas BroadMember, Broadie ✭✭✭
    edited October 2017

    hi @aryee I wonder if the short answer to your question is that your requesting more resources than you have quota for.

    I think the default quota is 24 cores. If you're asking for 32, the request won't ever get granted unless your quota is increased.

    Check out the attached PDF?

    Go to your google console and search "quota" and see your quotas and see if that's why?
    I think the default submission zone for cromwell is us-central-1f. I think it'll say in your logs (seems consistent with your error message too).

    You can write to "[email protected]" to request a quota increase which seems something you might want to do.

    In your WDL runtime block you can specify a zone....

    https://github.com/broadinstitute/cromwell#zones

    which you might have to do when/if you request a quota increase in a specific zone. It can be how your WDL is going interstate? Also, if you run in a different country it can be international too.....

  • aryeearyee Member, Broadie
    edited October 2017

    I tried requesting 16 cores and get the same failure message: "message: Task preprocess_hic.hicpro_align:NA:1 failed. JES error code 9. Message: Quota CPUS exceeded in region us-central1".

    It does work when I only request 8 cores, however.

    Where is the per-task CPU quota specified?

  • esalinasesalinas BroadMember, Broadie ✭✭✭

    @aryee @Tiffany_at_Broad make sure that in the cloud console that the project you have selected is the same as the workspace billing project. Do you confirm those match? Also as Tiffany mentions if you have multiple jobs running in a single workspace or jobs running in a different workspace (but the same namespace) then they all count against the same quotas (assuming same zone like us-central1).

  • aryeearyee Member, Broadie

    I now see in the Google Cloud Console for the "aryeelab-epigenomics" project that we have the default quota of only 24 CPUs and 500GB SSD. We're planning on using about ~500 cores so I will email [email protected] to get these increased. Thanks.

  • esalinasesalinas BroadMember, Broadie ✭✭✭

    @aryee depending on the setup, each VM needs an IP address and that could also affect throughput. Do a test run and see if in-use IP address is a bottleneck and if so, requesting quota for that might be something you may want/need to do in addition

Sign In or Register to comment.