We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

(How to) Run the GATK4 Docker locally and take a look inside

shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
edited August 2017 in GATK 4 Beta

Document is in BETA. It may be incomplete and/or inaccurate. Post suggestions to the Comments section and be sure to read about updates also within the Comments section.

1. Install Docker on your system

Install Docker for your system from https://docs.docker.com/engine/installation/, e.g. for Mac, Windows or Linux servers. There is also a program called Docker Toolbox and I have this installed but I don't think it's necessary for running Docker containers locally or on a server.

On my Mac, I just double-click on the Docker whale icon to start the application. Check that Docker is running in the Mac menu bar at top by clicking on the icon that looks like a whale-container-ship.


2. Check your Docker software installation

See the Docker version with docker --version.

$ docker --version
Docker version 17.06.0-ce, build 02c1d87

If you have trouble, you may need to run one or a number of the following commands.

docker-machine restart default
docker-machine regenerate-certs
docker-machine env

3. Download a Docker image from Dockerhub

In Docker, an image is the original from which we launch containers. We pull images from Dockerhub (https://hub.docker.com/), using Git like lingo. For example, the following command downloads a GATK4 docker image.

docker pull broadinstitute/gatk:4.beta.3

The part after the colon is the version of the container we pull. You can see which images you have locally with docker image ls. Here we see I have two different versions of broadinstitute/gatk, v4.beta.3 and v4.beta.2.

$ docker image ls
REPOSITORY                            TAG                    IMAGE ID            CREATED             SIZE
broadinstitute/gatk                   4.beta.3               5c138c493794        2 weeks ago         2.87GB
broadinstitute/gatk                   4.beta.2               507406cb4d85        3 weeks ago         2.88GB

4. Inspect a Docker image by running a container

There are two ways to inspect an image. One is with docker inspect 5c138c493794. The other is to launch a container off the image and root around within it much like you would a file system.

Launch a container with its tag or image ID. Whichever you use to launch a container, the tag or image ID, it becomes the image name.

docker run -i -t 5c138c493794


docker run -i -t broadinstitute/gatk:4.beta.3

We see then our bash opens into a location in the container preset by those who built the image.

[email protected]:/gatk#

We can check the contents of the current directory and the java version.

[email protected]:/gatk# ls -ltrh
total 148K
drwxr-xr-x  4 root root 4.0K Jul 26 15:49 docs
-rw-r--r--  1 root root  428 Jul 26 15:49 codecov.yml
-rwxr-xr-x  1 root root 4.5K Jul 26 15:49 build_docker.sh
-rw-r--r--  1 root root  21K Jul 26 15:49 build.gradle
-rw-r--r--  1 root root  33K Jul 26 15:49 README.md
-rw-r--r--  1 root root 1.5K Jul 26 15:49 LICENSE.TXT
-rw-r--r--  1 root root  690 Jul 26 15:49 Dockerfile
-rw-r--r--  1 root root  775 Jul 26 15:49 AUTHORS
drwxr-xr-x  1 root root 4.0K Jul 26 15:49 src
-rw-r--r--  1 root root   26 Jul 26 15:49 settings.gradle
drwxr-xr-x 10 root root 4.0K Jul 26 15:49 scripts
drwxr-xr-x  2 root root 4.0K Jul 26 15:49 resources_for_CI
-rwxr-xr-x  1 root root 5.2K Jul 26 15:49 gradlew
drwxr-xr-x  3 root root 4.0K Jul 26 15:49 gradle
-rwxr-xr-x  1 root root  19K Jul 26 15:49 gatk-launch
drwxr-xr-x  9 root root 4.0K Jul 26 15:53 build
-rw-r--r--  1 root root   40 Jul 26 15:55 run_unit_tests.sh
lrwxrwxrwx  1 root root   25 Jul 26 15:55 gatk.jar -> /gatk/build/libs/gatk.jar
-rw-r--r--  1 root root 1017 Jul 26 15:55 install_R_packages.R
[email protected]:/gatk# 
[email protected]:/gatk# java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
[email protected]:/gatk# 

When we exit out of the container, by typing exit, we exit out of it and also stop it from running. We can check all the stopped container instances that docker saves automatically with docker ps -a.

$ docker ps -a
CONTAINER ID        IMAGE                          COMMAND             CREATED              STATUS                     PORTS               NAMES
28035a3b71f1        broadinstitute/gatk:4.beta.3   "bash"              About a minute ago   Exited (0) 8 seconds ago                       silly_davinci
f944f81ff6d7        5c138c493794                   "bash"              6 minutes ago        Exited (0) 4 minutes ago                       fervent_wing
62fb9991a939        5c138c493794                   "bash"              6 minutes ago        Exited (0) 6 minutes ago                       tender_mirzakhani
96d91017226e        5c138c493794                   "bash"              3 days ago           Exited (0) 2 days ago                          vigilant_montalcini

As you can see, I have multiple containers launched from the same image. Notice, however, each container has a unique ID (under CONTAINER ID) and name (under NAMES). Whatever changes I make within a container get saved to that container. We can remove containers with docker container rm using either the container ID or name.

$ docker container rm silly_davinci
$ docker ps -a
CONTAINER ID        IMAGE                      COMMAND             CREATED             STATUS                      PORTS               NAMES
f944f81ff6d7        5c138c493794               "bash"              11 minutes ago      Exited (0) 9 minutes ago                        fervent_wing
62fb9991a939        5c138c493794               "bash"              11 minutes ago      Exited (0) 11 minutes ago                       tender_mirzakhani
96d91017226e        5c138c493794               "bash"              3 days ago          Exited (0) 2 days ago                           vigilant_montalcini
$ docker container rm f944f81ff6d7
$ docker ps -a
CONTAINER ID        IMAGE                      COMMAND             CREATED             STATUS                      PORTS               NAMES
62fb9991a939        5c138c493794               "bash"              12 minutes ago      Exited (0) 12 minutes ago                       tender_mirzakhani
96d91017226e        5c138c493794               "bash"              3 days ago          Exited (0) 2 days ago                           vigilant_montalcini

We can run one of these containers with docker start.

docker start 96d91017226e

It may take a minute for a container to start up. We can see the running containers with docker container ls.

$ docker container ls
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
96d91017226e        5c138c493794        "bash"              3 days ago          Up About a minute                       vigilant_montalcini

Finally, we can reattach to the running container.

docker attach vigilant_montalcini

On my local Mac, there is a glitch and I must press enter twice to show the docker container's bash prompt. You can also use the container ID instead of the name in the command. To exit out of a running container without stopping it, use Ctrl+P+Q.

5. Copy files from local system to the running container

There are two ways to do this, from within the container and from outside the container. I only know how to copy files from outside the container. The container can be stopped or running.

docker cp file_you_want_to_copy <container_id>:<file_path_to_target_dirctory>

For example,

docker cp tumor.seg 96d91017226e:/gatk

Copies the file tumor.seg into the container 96d91017226e's /gatk directory.

6. Save a modified container as an image and upload to Dockerhub

If you will modify a container to save, then remember that environmental variables, e.g. in bashrc, do not work in Docker containers. However, symlinks work well and you should create these in, e.g. /usr/bin with the ln -s path/to/item short_cut_name.

First, log into your Dockerhub account with docker login. If you don't have one, create one at https://hub.docker.com. My account is called spacecade7. For the container you have modified and wish to save a snapshot image of, use the following command.

docker commit 96d91017226e spacecade7/mygatk:versioning_tag1

Where the string that follows commit is the container ID. The last part points to my Dockerhub account followed by what I would like to call the image and an image version tag. This saves the image locally.

To save the image to Dockerhub, use docker push spacecade7/mygatk:versioning_tag1. The image should appear in your Dockerhub account.

Post edited by shlee on


  • EADGEADG KielMember ✭✭✭
    edited September 2017

    Hi @shlee,

    nice tutorial! Two short suggestions from my side and experience from working with Docker/GATK

    First instead of copying single files/dirs to the container you can mount a directory from the host inside the container with the run -v option:
    run -v, --volume=[host-src:]container-dest[:<options>]
    See manual-page for more information: Docker run manual

    For security reason (mostly) you should not be on the road with root-privilege all the time. To change this you can easily add a new user to the container when you are inside. And then save the image on DockerHub or locally as described.

    To start the container with this user add:
    --user docker_user userName
    to your run command.

    Greetings EADG

    Post edited by shlee on
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Thank you @EADG for the compliment and the additional information! The community will appreciate your instructions on mounting a local directory to the container. I was hoping someone would add this.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    I've come back to this doc a few times to remind myself how to do this so - THANK YOU!
    My typical use case is to figure out what version tools are. One command I found handy is 'cat Dockerfile'
    When I did this for the genomes in the cloud docker, I got this output which was exactly what I needed:
    LABEL GOTC_GATK34_VER=3.4-g3c929b0
    LABEL GOTC_GATK35_VER=3.5-0-g36282e4
    LABEL GOTC_GATK36_VER=3.6-44-ge7d1cd2
    LABEL GOTC_GATK4_VER=4.beta.1
    LABEL GOTC_BWA_VER=0.7.15.r1140
    LABEL GOTC_TABIX_VER=0.2.5_r1005
    Just passing along in case others find it helpful!

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin
    edited September 2017

    Interesting, version info is not provided is you run 'cat Dockerfile' in this GATK image.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Thanks @Tiffany_at_Broad, I'll request that we be able to get versioning with the command you shared.

  • moxumoxu Member
    edited May 2018

    Very good docker tutorial! Thanks, @shlee !

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
  • lcarvalholcarvalho BrazilMember

    Hello, I already installed docker and the tests were ok. I'm trying to run BaseRecalibrator on docker, but I fail to link dbSNP file as --know-sites. The problem is that I already used "docker run -v options" with my input files and the reference genome. Unfortunately, dbSNP file is too big (more than 10Gb), so I can not link to docker using -v option. This is a required file, so I could not run without it.

  • NicolasKNicolasK GermanyMember
    Maybe my answer is to late, as you already some time ago.
    Try to link the folder witch your dbsnp file.
    In my case I copied all the files I need to the folder I linked.
    Here is the command I used to link the folder:

    docker run -v /media/data/analysis:/gatk/my_data -it 9e737a9f562c
Sign In or Register to comment.