Consistent 503 Error For engine functions

I am using Cromwell 28.2 in production, and when submitting a large number of jobs, 1/4 - 1/3 of them seem to fail with a 503 service unavailable error, when using the size engine functions. The jobs are running on top of Google with the JES backend. I have set the number of retries on API timeout to 5, however, I am not observing any retries for the size function. Instead, the entire WF immediately fails.

From what I can tell, it does not appear that in this version of Cromwell, there are retries happening when the engine function receives a timeout or an error from the Google API. Is this fixed in later versions of Cromwell? Is this something that a config option can fix?


