Moving docker disk image off boot disk?

gordon123gordon123 BroadMember, Broadie

Is it possible to move the Docker disk image files off the boot disk?

In Cromwell 19.3, two different disk size parameters need to be set in the runtime block of the wdl, eg via the following:

    disks: "local-disk ${{output_disk_gb}} HDD"
    bootDiskSizeGb: "${{boot_disk_gb}}"

The first line specifies where the input and output files go, while the second specifies the host machine's boot disk. The Docker images also go there, in the default location of /var/lib/docker. This is fine if the algorithm only writes to the output directory, but many algorithms also write files elsewhere, eg /tmp. We have seen multiple instances where the /tmp directory became too large for the default boot disk, which typically crashes the machine before it has time to generate a useful error message.

Workarounds exist, eg symlinking /tmp to the output directory, or increasing space on the boot disk (requiring extra margin on two disks rather than one). But, a more convenient solution would be moving the docker image to the same disk used for the outputs.

I've gotten this to work on GCE by dropping a symlink in place of /var/lib/docker, though it looks like the same effect can be achieved via dockerd parameters.

Issue · Github
by Geraldine_VdAuwera

Issue Number
1612
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
knoblett

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @gordon123, I don't know the answer to your question but will get help from the engineering team.

  • gordon123gordon123 BroadMember, Broadie

    The boot disk filling issue happened to us again. A file needs to be unzipped to /root, the home directory, in order for the 3rd party software to work. However, /root is on the boot disk, so the node crashed without reporting an error. The file is of just the right size such that this crash occurs intermittently. Below is one example out of a half dozen recent examples of this run crashing.

    The workaround is straightforward - dropping a symlink to point the directory we want to write to to something under /cromwell_root - but the troubleshooting was frustrating, and could have been obviated by moving the docker image off the boot disk.

    https://portal.firecloud.org/#workspaces/broad-firecloud-benchmark:benchmarking-1/Monitor/3a64e707-2544-409b-8a56-9a888840045d

    Workspace: broad-firecloud-benchmark/benchmarking-1
    Analysis submission: 3a64e707-2544-409b-8a56-9a888840045d
    Workflow IDs: 04b3f189-18f3-47b3-972c-0e59d2a56174, e53f6837-623b-4308-a638-6feb889c6c99

    The task in question for both of these workflows is CallingGroup_Workflow.VEP_Task.

    .

  • gordon123gordon123 BroadMember, Broadie

    Looks like /cromwell_root is not the mounted directory when running Cromwell locally, instead it looks like I need to save whatever the cwd is when the job starts and point symlinks there.

  • ThibThib CambridgeMember, Broadie, Dev ✭✭

    Hi !
    If I understand correctly what you're asking, I think the answer is no unfortunately. The docker image is pulled by the Google Pipelines API, and we have no way to give them a custom location where to pull the image.
    Note that we do export a couple environment variable before running the WDL command (which is only useful if the command checks and uses them):

    export _JAVA_OPTIONS=-Djava.io.tmpdir=/cromwell_root/tmp
    export TMPDIR=/cromwell_root/tmp
    

    Also I'm not entirely sure I see how moving the docker image off the boot disk would solve this ? Except freeing some space but anything writing to /tmp would still go to the boot disk right ? Unless I'm missing something.

    About your last comment, when running locally on docker, cromwell mounts the call execution directory (which root you can set in the configuration - https://github.com/broadinstitute/cromwell/blob/develop/core/src/main/resources/reference.conf#L176) on your machine to /root/path/to/call/directory inside the docker container.

  • gordon123gordon123 BroadMember, Broadie

    If you are able to 1) change the docker command used to start the docker daemon, or 2) run something on the base VM prior to running the contents of the docker container, then it should be feasible to move the docker image storage. The default location of the docker storage is /var/lib/docker. If you can do #1, tweaking the command line can point it to eg /opt/docker. If you can do #2, you can stop the docker daemon, move the contents of /var/lib/docker to /opt/docker, drop a symlink from the first pointing to the second, and restart the docker daemon; ideally this would be done before pulling in the algorithm's docker image, to make the copy go quickly. In my case, I found it simplest to drop the symlink prior to installing docker, and saving the image set up this way.

    If this is done, then writing a huge file to /tmp inside the docker will bloat up files that live under /opt/docker rather than under /var/lib/docker. Having to provision one huge disk that handles everything is easier than provisioning two huge disks. It is less wasteful, given that an extra margin is needed for only one disk rather than two, and it avoids a commonly occurring cryptic error/crash that results from leaving zero bytes on the boot disk.

Sign In or Register to comment.