We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

(howto) Get started with GATK4 beta

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited June 2017 in Tutorials

Download the software

The GATK4 beta version command-line tools are provided as a single executable jar file. You can download a zipped package containing the jar file from this Github link (GATK4 Download page coming soon). Once you unzip the package, you will find four files inside the resulting directory:


where x is the minor release version in the jar file names.

Now you may ask, why are there two jars? As the names suggest, gatk-package-4.beta.x-spark.jar is the jar for running Spark tools on a Spark cluster, while gatk-package-4.beta.x-local.jar is the jar that is used for everything else (including running Spark tools "locally", ie on a regular server or cluster).

So does that mean you have to specify which one you want to run each time? Nope! See the gatk-launch file in there? That's an executable wrapper script that you invoke and that will choose the appropriate jar for you based on the rest of your command line. You can still invoke a specific jar if you want, but using gatk-launch is easier, and it will also take care of setting some parameters that you would otherwise have to specify manually. We'll talk about that in a minute.

Install it

There is no installation necessary in the traditional sense, since the precompiled jar files should work on any POSIX platform (NOT Microsoft Windows!) equipped with the appropriate version of Java (see below). You'll simply need to open the downloaded package and place the folder containing the jar files in a convenient directory on your hard drive (or server). Although the jars themselves cannot simply be added to your PATH, you can do so with the gatk-launch wrapper script. Please look up instructions depending on the terminal shell you use; in bash the typical syntax is export PATH=$PATH:/path/to/gatk/gatk-launch where path/to/ is the path to the location of the gatk-launch executable. Note that the jars must remain in the same directory as gatk-launch for it to work.

Important note about Java version

For the tools to run properly, you must have Java 8 / JDK or JRE 1.8 installed. To check your java version, open your terminal application and run the following command:

java -version

If the output looks something like java version "1.8.x_y", you are good to go. If not, you may need to change your version. You can download a suitable upgrade either from Oracle or from OpenJDK. To be clear, OpenJDK is now fully supported.

Test that it works

To test that you can run GATK tools, run the following command in your terminal application (we assume that you have added gatk-launch to your PATH):

./gatk-launch --help

This will output a summary of the GATK4 invocation syntax, options for listing tools and invoking a specific tool's help documentation, and main Spark options.

Use GATK tools

Tools are invoked as follows:

./gatk-launch ToolName -OPTION1 value1 -OPTION2 value2 

If you have previous used older GATK versions, you'll notice that ToolName is no longer passed with -T and that it is now positional: the tool name must always be the first thing you write after the ./gatk-launch part (or the jar file if you're invoking the jar directly).

Available tools are all listed in the Tool Documentation section, which is versioned; on the website, use the orange dropdown menu button to switch between versions. This provides a complete list of tools with usage recommendations, options, and example commands.

Docker image

Docker images for GATK4 releases can be found at https://hub.docker.com/r/broadinstitute/gatk/

Post edited by Geraldine_VdAuwera on


Sign In or Register to comment.