Should I have a such huge size of cromwell-executions folder?

Dear Cromwell Team,

I was runing gatk4 best practice in my cPouta VM with docker (24Vcpu and 3T storage), (gatk4-data-preprocessing and germline-snps-indels) with NA12878_24_RG_small sample data.
After workflow finished, I found that my inputs folder size is just about 40Gib, while the size of my cromwell-executions folder is about 500 Gib. That is really huge and also when I tried to use normal NA12878 sample to run the workflow, it reported no space on device.

Is it the right situation (get such a huge executions folder) or I did something wrong? I checked the executions folder, basically, it copied inputs for each task call, I think it is the main reason for getting such a huge size.

I did noticed many [warn]: Localization via hard link has failed. Is it the reason why cromwell copied inputs every time for every task call? How could I fix it?

Answers

  • Sorry for a little clearing: Inside of the execution folder, I think it didn´t copy inputs from orignal inputs folder but it did created input folder and generated all inputs needed for every task inside of its input folder. Is it normal? or I l forget to define some attribute or parameter? In this way, cormwell generate a huge size of executions folder with all these middle file.

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Hey @Angry_Panda,

    When running locally, Cromwell purposefully states all the required inputs into an inputs directory inside of the call execution directory. However, one can configure Cromwell to link those inputs files instead of copy them -- described here: https://gatkforums.broadinstitute.org/wdl/discussion/8115/avoid-copying-input-files

    However, it seems like the hard-linking is failing for you -- can you describe the filesystem you're using, where your inputs live, location from where you're running Cromwell, and the way you're running Cromwell (command line).

    Thanks!

Sign In or Register to comment.