We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
[WDL][Cromwell[ Mounting a directory to the docker for access.

Hi,
I am attempting to run Gemini within a docker through WDL and Cromwell. I have installed gemini with no data as the data is too large to be put into a Docker (plus it's bad practice). So I need to download the data elsewhere, and link it to be available for the gemini binary to access. Locally on my own machine without WDL, I might run the following to get this to work:
docker run -rm -v /path/to/local/gemini/data:/path/to/container/gemini/data -i gemini load -t VEP -v my.vcf my.db
At the bottom is have outlined my submission script with google genomics pipelines run and the yaml configuration for background. However, the crux of my problem is that I am unsure with the Broad docker image for wdl_runner what the mount procedure for the docker is.
In the WDL documentation, for local backends, the docker by default does the following:
docker run --rm -v <cwd>:<docker_cwd> -i <docker_image> /bin/bash < <script>
Now supposing I have my data in a google bucket at gs://my_bucket/data_for_gemini. How would I define in WDL the appropriate code to mount that google bucket directory so gemini inside the docker can access it?
Example WDL:
task Gemini { File my_vcf # how to pass an entire google bucket directory as a target site? command { # define mounts in here somehow? gemini load -t VEP -v ${my_vcf} out.db } runtime { # define mounts in here? docker: "gcr.io/my_containers/gemini" memory: "4 GB" cpu: "1" } output { File gemini_db = "out.db" } }
I have thought one inelegant solution would be to run a docker in a docker and mount via that way. But I wanted to know if there would be a better and more elegant way.
-- Derrick DeConti
My submission script is:
gcloud alpha genomics pipelines run \ --pipeline-file wdl_pipeline.yaml \ --zones us-east1-b \ --logging gs://dfci-cccb-pipeline-testing/logging \ --inputs-from-file WDL=VariantCalling.cloud.wdl \ --inputs-from-file WORKFLOW_INPUTS=VariantCalling.cloud.inputs.json \ --inputs-from-file WORKFLOW_OPTIONS=VariantCalling.cloud.options.json \ --inputs WORKSPACE=gs://dfci-cccb-pipeline-testing/workspace \ --inputs OUTPUTS=gs://dfci-cccb-pipeline-testing/outputs
The resultant yaml is as follows:
name: WDL Runner description: Run a workflow defined by a WDL file inputParameters: - name: WDL description: Workflow definition - name: WORKFLOW_INPUTS description: Workflow inputs - name: WORKFLOW_OPTIONS description: Workflow options - name: WORKSPACE description: Cloud Storage path for intermediate files - name: OUTPUTS description: Cloud Storage path for output files docker: imageName: gcr.io/broad-dsde-outreach/wdl_runner cmd: > /wdl_runner/wdl_runner.sh resources: minimumRamGb: 1
Best Answer
-
ChrisL Cambridge, MA admin
Hi @deconti,
I'm afraid that this is actually not a feature in Cromwell yet, but it'd be great if you raised volume mounting as a new feature request on the Cromwell github and tagged @KateVoss for assigning a priority to it.
In the current Cromwell. you do have a few options, depending on which works best for you:
a) Add the database to the docker image (as you said, maybe against best practices). This'll be reset every time you start a new task, since it's not a persistent disk. So, probably only an option if you think there's value in a stable initial database.
b) Include the database as a single file (eg zip, tar, ...), and expand it during the task. Export it at the end as an output. In other words:
task foo { File db_in command { <<Expand DB>> # your command <<Compress DB>> } output { File db_out = ... } }
I'm not sure what you meant by docker inside docker, but as far as I know on JES, you cannot create docker images inside other docker images.
Chris
Answers
Hi, Geraldine.
Gemini accesses the data via an option flag on the command line during installation of the software. The option flag would point to a directory. As it's considered bad practice to load all that data into a Docker, it's suggested to create a docker volume. So, what I would do on my local machine is install gemini:
The next step is to then load the data:
From here, if I'm following suggested practices with docker, I would instead download the data to my computer locally (i.e. non-docker), then mount the data instead to /usr/bin_dir/gemini/data, and change gemini's yaml configuration file to point the annotation directory to this mounted location (/usr/bin_dir/gemini/data).
I hope that helps explain it.
Thanks,
Derrick
@deconti, thanks for the explanation. I'm not familiar with this enough myself but I'll ask one of our engineers to help propose a solution.
Hi @deconti,
I'm afraid that this is actually not a feature in Cromwell yet, but it'd be great if you raised volume mounting as a new feature request on the Cromwell github and tagged @KateVoss for assigning a priority to it.
In the current Cromwell. you do have a few options, depending on which works best for you:
a) Add the database to the docker image (as you said, maybe against best practices). This'll be reset every time you start a new task, since it's not a persistent disk. So, probably only an option if you think there's value in a stable initial database.
b) Include the database as a single file (eg zip, tar, ...), and expand it during the task. Export it at the end as an output. In other words:
I'm not sure what you meant by docker inside docker, but as far as I know on JES, you cannot create docker images inside other docker images.
Chris
@deconti @ChrisL We'll put in the Cromwell ticket
Cromwell ticket is in https://github.com/broadinstitute/cromwell/issues/2190 (but needs refining)
Thanks, ChrisL.
That answers my question. (I also did not realize I could not perform docker in docker via JES.)
I'll either try getting Docker to incorporate all the downloaded data into one container. Or I'll just have to have Gemini run outside of Cromwell and WDL.
Thanks, again.
Hello GATK team, I had a similar issue where I was hoping I could mount a tmpfs type mount. I posted a note on the github issue that is related to this as well:
https://github.com/broadinstitute/cromwell/issues/2190
Mainly we are hoping that the docker can be launched with something like (from https://docs.docker.com/storage/tmpfs/#limitations-of-tmpfs-containers)
Were we can mount a tmpfs volume and declare its mount point on google cloud.
We currently do this on our local cromwell runs by giving the submit a docker run with our own runtime parameter
${'--mount type=tmpfs,destination='+mount_tmpfs}
This lets us use a ramdisk to unpack tens of thousands of files in seconds. I don't see a strait forward way to add this for the google cloud submit so if it can be supported in part of this feature request it would be great!
Thanks,
Jason
@g3n3
Hi Jason,
Sorry for the delay. I was away at a workshop. Let me ask someone else form the team answer you soon.
-Sheila
I'd suggest waiting for the WDL Directory type to be added, at which point whether the path gets localized via a direct "copy everything onto the VM" or a less heavyweight "mount path as a volume" could be a configuration/runtime/customization option.
Warning! What I said above is true unless you're expecting this mount to be read/write, which I would say will probably not be coming. The WDL language and Cromwell engine are pretty heavily based on the assumption that the values they move around are immutable, so altering Directories in place as part of a command would probably cause more problems than it solves...
Hi,
I had a similar issue by trying to mount the database-directory for VEP, I get away with a cheap trick by calling the docker_container from the task itself like:
Test locally on cromwell 29.
Maybe someone can make use out of it.
Hi,
just some addition to my previous post. I run into a problem when call-caching was activated and wdl was calling results from a previous run via a symlink. Docker mounts the directory with the symlinks for the files, but actually, cant access the files because it cant follow the symlinks. To avoid this behavior you have to mount the original path of the files to the docker container.
Its dirty I know but its work
Test locally on cromwell 29.
@EADG
Hi,
Thank you for sharing
-Sheila