Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
(How to) Run the GATK4 Docker locally and take a look inside
Document is in
BETA. It may be incomplete and/or inaccurate. Post suggestions to the Comments section and be sure to read about updates also within the Comments section.
1. Install Docker on your system
Install Docker for your system from https://docs.docker.com/engine/installation/, e.g. for Mac, Windows or Linux servers. There is also a program called Docker Toolbox and I have this installed but I don't think it's necessary for running Docker containers locally or on a server.
On my Mac, I just double-click on the Docker whale icon to start the application. Check that Docker is running in the Mac menu bar at top by clicking on the icon that looks like a whale-container-ship.
2. Check your Docker software installation
See the Docker version with
$ docker --version Docker version 17.06.0-ce, build 02c1d87
If you have trouble, you may need to run one or a number of the following commands.
docker-machine restart default
3. Download a Docker image from Dockerhub
In Docker, an image is the original from which we launch containers. We pull images from Dockerhub (https://hub.docker.com/), using Git like lingo. For example, the following command downloads a GATK4 docker image.
docker pull broadinstitute/gatk:4.beta.3
The part after the colon is the version of the container we pull. You can see which images you have locally with
docker image ls. Here we see I have two different versions of
broadinstitute/gatk, v4.beta.3 and v4.beta.2.
$ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE broadinstitute/gatk 4.beta.3 5c138c493794 2 weeks ago 2.87GB broadinstitute/gatk 4.beta.2 507406cb4d85 3 weeks ago 2.88GB
4. Inspect a Docker image by running a container
There are two ways to inspect an image. One is with
docker inspect 5c138c493794. The other is to launch a container off the image and root around within it much like you would a file system.
broadinstitute/gatkimage is built automatically from a script documented at https://github.com/broadinstitute/gatk/blob/master/scripts/docker/. For tools that the script installs, see https://github.com/broadinstitute/gatk/blob/master/scripts/docker/gatkbase/Dockerfile.
Launch a container with its tag or image ID. Whichever you use to launch a container, the tag or image ID, it becomes the image name.
docker run -i -t 5c138c493794
docker run -i -t broadinstitute/gatk:4.beta.3
We see then our bash opens into a location in the container preset by those who built the image.
We can check the contents of the current directory and the java version.
[email protected]:/gatk# ls -ltrh total 148K drwxr-xr-x 4 root root 4.0K Jul 26 15:49 docs -rw-r--r-- 1 root root 428 Jul 26 15:49 codecov.yml -rwxr-xr-x 1 root root 4.5K Jul 26 15:49 build_docker.sh -rw-r--r-- 1 root root 21K Jul 26 15:49 build.gradle -rw-r--r-- 1 root root 33K Jul 26 15:49 README.md -rw-r--r-- 1 root root 1.5K Jul 26 15:49 LICENSE.TXT -rw-r--r-- 1 root root 690 Jul 26 15:49 Dockerfile -rw-r--r-- 1 root root 775 Jul 26 15:49 AUTHORS drwxr-xr-x 1 root root 4.0K Jul 26 15:49 src -rw-r--r-- 1 root root 26 Jul 26 15:49 settings.gradle drwxr-xr-x 10 root root 4.0K Jul 26 15:49 scripts drwxr-xr-x 2 root root 4.0K Jul 26 15:49 resources_for_CI -rwxr-xr-x 1 root root 5.2K Jul 26 15:49 gradlew drwxr-xr-x 3 root root 4.0K Jul 26 15:49 gradle -rwxr-xr-x 1 root root 19K Jul 26 15:49 gatk-launch drwxr-xr-x 9 root root 4.0K Jul 26 15:53 build -rw-r--r-- 1 root root 40 Jul 26 15:55 run_unit_tests.sh lrwxrwxrwx 1 root root 25 Jul 26 15:55 gatk.jar -> /gatk/build/libs/gatk.jar -rw-r--r-- 1 root root 1017 Jul 26 15:55 install_R_packages.R [email protected]:/gatk#
[email protected]:/gatk# java -version openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) [email protected]:/gatk#
When we exit out of the container, by typing
exit, we exit out of it and also stop it from running. We can check all the stopped container instances that docker saves automatically with
docker ps -a.
$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 28035a3b71f1 broadinstitute/gatk:4.beta.3 "bash" About a minute ago Exited (0) 8 seconds ago silly_davinci f944f81ff6d7 5c138c493794 "bash" 6 minutes ago Exited (0) 4 minutes ago fervent_wing 62fb9991a939 5c138c493794 "bash" 6 minutes ago Exited (0) 6 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
As you can see, I have multiple containers launched from the same image. Notice, however, each container has a unique ID (under
CONTAINER ID) and name (under
NAMES). Whatever changes I make within a container get saved to that container. We can remove containers with
docker container rm using either the container ID or name.
$ docker container rm silly_davinci silly_davinci $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f944f81ff6d7 5c138c493794 "bash" 11 minutes ago Exited (0) 9 minutes ago fervent_wing 62fb9991a939 5c138c493794 "bash" 11 minutes ago Exited (0) 11 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
$ docker container rm f944f81ff6d7 f944f81ff6d7 $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 62fb9991a939 5c138c493794 "bash" 12 minutes ago Exited (0) 12 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
We can run one of these containers with
docker start 96d91017226e
It may take a minute for a container to start up. We can see the running containers with
docker container ls.
$ docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 96d91017226e 5c138c493794 "bash" 3 days ago Up About a minute vigilant_montalcini
Finally, we can reattach to the running container.
docker attach vigilant_montalcini
On my local Mac, there is a glitch and I must press enter twice to show the docker container's bash prompt. You can also use the container ID instead of the name in the command. To exit out of a running container without stopping it, use
5. Copy files from local system to the running container
There are two ways to do this, from within the container and from outside the container. I only know how to copy files from outside the container. The container can be stopped or running.
docker cp file_you_want_to_copy <container_id>:<file_path_to_target_dirctory>
docker cp tumor.seg 96d91017226e:/gatk
Copies the file
tumor.seg into the container
6. Save a modified container as an image and upload to Dockerhub
If you will modify a container to save, then remember that environmental variables, e.g. in bashrc, do not work in Docker containers. However, symlinks work well and you should create these in, e.g.
/usr/bin with the
ln -s path/to/item short_cut_name.
First, log into your Dockerhub account with
docker login. If you don't have one, create one at https://hub.docker.com. My account is called spacecade7. For the container you have modified and wish to save a snapshot image of, use the following command.
docker commit 96d91017226e spacecade7/mygatk:versioning_tag1
Where the string that follows commit is the container ID. The last part points to my Dockerhub account followed by what I would like to call the image and an image version tag. This saves the image locally.
To save the image to Dockerhub, use
docker push spacecade7/mygatk:versioning_tag1. The image should appear in your Dockerhub account.