If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Instructions for GATK workshops
Preparing for the workshop
To follow these instructions and attend the workshop, you will need to have a basic understanding of the meaning of the following words and command-line operations. If you are unfamiliar with any of the following, you should consult a more experienced colleague or your system administrator if you have one. There are also many good online tutorials you can use to learn the necessary notions.
- Basic Unix environment commands
- Binary / Executable
- Adding a binary to your path (optional)
- Command-line shell, terminal or console
- Software library
- Todo list for the impatient
- Platform requirements (hardware and environment)
- Software packages to install
- Workshop materials (data, worksheets and slides)
1. Todo list for the impatient
- Set up GATK4 Docker as described here. The Docker version used in the workshop is indicated in a worksheet labeled "Run GATK in a Docker Container" and can be found in the workshop directory's exercises folder. See Section 4 to access the workshop directory.
- Download and install IGV genome browser.
- Download the workshop materials (see section 4 for instructions).
2. Platform requirements
See Quickstart for general GATK software requirements.
Important note about MS Windows: We try to support participants running on Windows systems, and we find most of the workshop exercises run well using Docker on Windows. However we often encounter technical issues that are specific to Windows, and some of these issues currently have no solution. For that reason, we cannot guarantee full success with Windows, and we encourage you to make arrangements to use a Linux system for the workshop.
The analyses we run in workshops are designed to run quickly and on small datasets, so they can run on single-processor machines and should not require more than 4G of RAM. For file storage, plan on 10G of space minimum.
We use Docker to ensure that all workshop participants are working with the same environment. This greatly reduces time wasted dealing with environment differences or dependency-related issues. Participants who choose to work with a different setup will be responsible for adapting instructions accordingly.
Be sure to install and configure the Docker environment correctly before the workshop by following this procedure, including pulling the GATK4 docker image. It is a very large file and may take a long time to download, so this must be done in advance.
Running on remote servers is not recommended as we will use desktop software such as IGV. Participants who choose to run on a remote server will be responsible for setting up with network mounts or transferring files to work with desktop software.
3. Software packages to install
- Genome Analysis Toolkit (GATK) and Picard
- IGV genome browser
- SublimeText or other text editor
Genome Analysis Toolkit (GATK) and Picard
Hopefully, if you're reading this, you're already acquainted with the purpose of the GATK. As described in more detail in the Quickstart guide, you can either download the GATK package and run it directly in the "traditional" way, or you can run it from within a Docker container. In our workshops, we use Docker, so you will need to follow this procedure to install Docker and get the GATK container image installed appropriately. This may seem a bit more complicated up front but it eliminates the majority of problems we see people struggle with.
IGV genome browser
The Integrated Genomics Viewer is a genome browser that allows you to view BAM, VCF and other genomic file information in context. It has a graphical user interface that is very easy to use and can be downloaded for free (though registration is required) from this website. We encourage you to read through IGV's very helpful user guide, which includes many detailed tutorials that will help you use the program most effectively.
WDL can be written with any text editing program, but for this workshop we will be using SublimeText. It is a simple but effective program, and you can download it here. This program also allows syntax highlighting for WDL, which you can install by following the instructions here.
4. Workshop materials (data, worksheets and slides)
We provide a bundle containing test datasets, worksheets with the instructions for the hands-on exercises, and all slide decks presented in the workshop. You can find GATK workshop bundles organized by YYMM (year-month) in the GATK Workshops directory. If you are registered for an upcoming workshop where you will be using your own laptop, you MUST download the bundle before coming to the workshop. If we update the bundle ahead of the workshop, you will receive a notification with a reminder to download the new version.
For those attending pipelining-only workshops, the workshop bundle will differ. Please check your email for where to find these materials.