Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

How to delete execution data form old runs?

If we have a workflow of this form:

Inputs -> Temporary Files 1 -> Important Files 1 -> Temp files 2 -> Temp files 3 -> Important Files 2 -> Temp files 4 ...

In a setting where we're using cromwell in server mode backed with mysql for call-caching, eventually we'll need to delete old temporary files to avoid running out of space (assuming we already have some mechanism for moving the important files out somewhere else).

Deleting intermediate files has been discussed here, however, deleting the workflow execution directory right after running the workflow would invalidate call-caching (would it? I guess so, because there would be no output file to create a link to) .

I'm looking for a way to delete old executions, for example, a daily cron job deleting all executions older than 1 year. If I just go and delete the executions directories, would the mysql database get corrupted? What would be involved in such clean up process? Does cromwell provide any features to help with this? E.g. I imagine a DELETE workflow API endpoint would be useful here.

Thank you!

Best Answer


  • Thank you that's very useful!
    Regarding the mysql database, will invalid rows get deleted, or just get marked as invalid? I worry that the database may grow indefinitely, especially if we run thousands of tasks per day.

Sign In or Register to comment.