(howto) Get started with GATK4 beta
Download the software
The GATK4 beta version command-line tools are provided as a single executable jar file. You can download a zipped package containing the jar file from this Github link (GATK4 Download page coming soon). Once you unzip the package, you will find four files inside the resulting directory:
gatk-launch gatk-package-4.beta.x-local.jar gatk-package-4.beta.x-spark.jar README.md
x is the minor release version in the jar file names.
Now you may ask, why are there two jars? As the names suggest,
gatk-package-4.beta.x-spark.jar is the jar for running Spark tools on a Spark cluster, while
gatk-package-4.beta.x-local.jar is the jar that is used for everything else (including running Spark tools "locally", ie on a regular server or cluster).
So does that mean you have to specify which one you want to run each time? Nope! See the
gatk-launch file in there? That's an executable wrapper script that you invoke and that will choose the appropriate jar for you based on the rest of your command line. You can still invoke a specific jar if you want, but using
gatk-launch is easier, and it will also take care of setting some parameters that you would otherwise have to specify manually. We'll talk about that in a minute.
There is no installation necessary in the traditional sense, since the precompiled jar files should work on any POSIX platform (NOT Microsoft Windows!) equipped with the appropriate version of Java (see below). You'll simply need to open the downloaded package and place the folder containing the jar files in a convenient directory on your hard drive (or server). Although the jars themselves cannot simply be added to your PATH, you can do so with the
gatk-launch wrapper script. Please look up instructions depending on the terminal shell you use; in
bash the typical syntax is
export PATH=$PATH:/path/to/gatk/gatk-launch where
path/to/ is the path to the location of the
gatk-launch executable. Note that the jars must remain in the same directory as
gatk-launch for it to work.
Important note about Java version
For the tools to run properly, you must have Java 8 / JDK or JRE 1.8 installed. To check your java version, open your terminal application and run the following command:
If the output looks something like
java version "1.8.x_y", you are good to go. If not, you may need to change your version. You can download a suitable upgrade either from Oracle or from OpenJDK. To be clear, OpenJDK is now fully supported.
Test that it works
To test that you can run GATK tools, run the following command in your terminal application (we assume that you have added
gatk-launch to your PATH):
This will output a summary of the GATK4 invocation syntax, options for listing tools and invoking a specific tool's help documentation, and main Spark options.
Use GATK tools
Tools are invoked as follows:
./gatk-launch ToolName -OPTION1 value1 -OPTION2 value2
If you have previous used older GATK versions, you'll notice that
ToolName is no longer passed with
-T and that it is now positional: the tool name must always be the first thing you write after the
./gatk-launch part (or the jar file if you're invoking the jar directly).
Available tools are all listed in the Tool Documentation section, which is versioned; on the website, use the orange dropdown menu button to switch between versions. This provides a complete list of tools with usage recommendations, options, and example commands.
Docker images for GATK4 releases can be found at https://hub.docker.com/r/broadinstitute/gatk/