Gathering job resource usage stats from backend

After I've run a task on a backend, I'd like to gather resource usage stats of the task (memory, CPU, etc.). When using SGE, one can achieve this by running qacct -j {job_id}. Is there a way to retrieve these stats with WDL / Cromwell?

I guess if Cromwell allowed running something like a "post-job-retire" command, one could parametrise it with "run qacct" (in the same sense that one can parametrise how to submit, abort, or check if a job is a alive).

Thank you!

Tagged:

Best Answer

Answers

  • RuchiRuchi Member, Broadie, Moderator, Dev

    Hey @ocampoernesto

    There's an option called monitoring_script which is a reference to script that's run in the background to collect data like memory and disk usage. I'm attaching an example monitoring script (couldn't attach it with a .sh extension so it's attached as .sh.txt, and docs on how to specify this option. You can modify the given script to your needs. This option generates a file called "monitoring.log" for each task in a workflow -- which contains the output from the monitoring script.

  • Hi @Ruchi thanks for the reply.

    I'd seen the monitoring script before, I think it's good for monitoring the task's resource usage as it runs. However, I'm more after something that will give me the totals after a task has finished, like total CPU time used, total IO used, max RSS, etc.

    I guess running stuff under /usr/bin/time and then collecting the results manually could be a workaround, although one would have to add a bit of code to each task.

    It would be interesting to be able to query workflows / tasks via REST, and from the results, calculate how much CPU / IO a given task consumes. I understand this would be a quite involved piece of work though.

    Thanks!

Sign In or Register to comment.