GATK workshops

edited February 27 in GATK4 User Guide

What's in a GATK workshop?

The term "workshop" is used all over the place to describe very different things. In the GATK world, a workshop is a multi-day course that includes both lectures and hands-on exercises, interleaved to provide a well-balanced learning experience.

Our standard 4-day workshop, described below, covers basic genomics, all currently supported Best Practices pipelines as well as pipelining with WDL/Cromwell/FireCloud. Other formulas may be available upon request.

Course outline

The workshop focuses on the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. Participants will learn why each step is essential to the variant discovery process, what are the operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of their dataset. In the course of this workshop, we highlight key functionalities such as the GVCF workflow for joint discovery of germline short variants in cohorts, somatic short variant discovery using Mutect2, and copy number variation discovery using GATK-CNV. We also exercise the use of pipelining tools to assemble and execute GATK workflows.

Hands-on sessions

In the hands-on sessions focused on analysis, we walk participants through exercises that teach them how to manipulate the standard data formats involved in variant discovery and how to apply GATK tools appropriately to common use cases and data types. In the course of these exercises, we demonstrate useful tips and tricks for interacting with GATK and Picard tools, dealing with problems, and using third-party tools such as IGV and RStudio.

In the optional hands-on sessions on pipelining, we walk participants through exercises that teach them to write workflow scripts using WDL, the Broad's new Workflow Description Language, and to execute these workflows locally with Cromwell as well as through FireCloud, our publicly available, secure cloud-based analysis service.

Target audience

This workshop is aimed at a mixed audience of people who are new to the topic of variant discovery or to GATK, seeking an introductory course into the tools, or who are already GATK users seeking to improve their understanding of and proficiency with the tools. Participants should already be familiar with the basic terms and concepts of genetics and genomics. Basic familiarity with the command line environment is required.

Environment

Participants will be expected to bring their own laptops with software preinstalled (detailed instructions here) unless the workshop host provides a computer lab or cloud-based platform. Supported systems are Mac and Unix/Linux systems; MS Windows is NOT supported.

Limitations

Please note that this workshop is focused on human data analysis. The majority of the materials presented does apply equally to non-human data, and we will address some questions regarding adaptations that are needed for analysis of non-human data, but we will not go into much detail on those points.


Typical workshop schedule

The schedule is given here for a 9am-4pm timeframe but this can be adapted to local needs, e.g. 10am-5pm.

Day 1: Introduction to Genomic Analysis

Morning (9am – 12pm)

  • 09:00 Opening remarks
  • 09:15 Introduction to Sequence data / pre-processing workflow
  • 09:45 Introduction to Germline variant discovery Best Practices workflows
  • 10:15 Coffee/tea break
  • 10:45 Introduction to Somatic variant discovery Best Practices workflows
  • 11:15 Introduction to pipelining with WDL + Cromwell + FireCloud
  • 11:45 Closing question time

Lunch Time (12pm – 1pm)

Afternoon (1pm – 4pm)

  • 13:00 Mapping
  • 13:25 Marking Duplicates
  • 13:50 Base recalibration (BQSR)
  • 14:15 Coffee/tea break
  • 14:45 Hands-on data exploration

Day 2: Germline short variant discovery

Morning (9am – 12pm)

  • 9:00 Recap of germline variant discovery Best Practices
  • 9:15 HaplotypeCaller
  • 9:45 Joint-calling with GenomicsDB + GenotypeGVCFs
  • 10:15 Coffee/tea break
  • 10:45 Hands-on joint-calling

Lunch Time (12pm – 1pm)

Afternoon (1pm – 4pm)

  • 13:00 Filtering with VQSR
  • 13:30 Genotype Refinement
  • 14:00 Callset Evaluation
  • 14:15 Coffee/tea break
  • 14:45 Hands-on filtering approaches

Day 3: Somatic variant discovery

Morning (9am – 12pm)

  • 9:00 Recap of somatic variant discovery Best Practices
  • 9:15 Somatic SNVs and indels with Mutect2
  • 10:00 Coffee/tea break
  • 10:30 Hands-on Mutect2

Lunch Time (12pm – 1pm)

Afternoon (1pm – 4pm)

  • 13:00 Somatic CNVs with GATK CNV
  • 13:30 Hands-on GATK CNV
  • 14:45 Coffee/tea break
  • 15:15 Preview of upcoming methods: germline CNV and SV
  • 15:45 Open question time

Day 4: Pipelining

Morning (9am – 12pm)

  • 9:00 Recap of WDL/Cromwell basics
  • 9:15 Hands-on WDL/Cromwell basics
  • 10:15 Coffee/tea break
  • 10:45 Self-paced WDL exercises

Lunch Time (12pm – 1pm)

Afternoon (1pm – 4pm)

  • 13:00 Recap of FireCloud basics
  • 13:15 Hands-on FireCloud Part 1
  • 14:15 Coffee/tea break
  • 14:45 Hands-on FireCloud Part 2

Hosting requirements

1. Expenses

a. The host must pay for all expenses incurred by the training crew that falls under these categories: air travel (in economy class for flights durations <4 hours and economy plus/extra legroom etc for flight durations > 4 hours) and any associated booking and baggage fees, plus ground transportation (including taxi/uber/lyft/rental car as necessary), accommodation (individual room for each trainer), subsistence/meals (either piecemeal or per-diem) and communications (cellphone roaming fees, inflight wifi, etc) during the period defined below. Speaker fees for the trainers are welcomed but not required.

b. The period covered for expense purposes starts on the day preceding the first day of training for time zones within 8 hours (inclusive) from Boston, MA, and two days preceding the first day of training for time zones beyond 8 hours (exclusive) from Boston, MA. It ends on the day following the last day of training. In practice this means a 4-day workshop will require 5 nights of accommodation for a nearby location and 6 nights of accommodation for a distant location. Meals and communications charges incurred during travel are considered to fall in the covered period even if there is a gap between the travel days and the covered period.

b. Trainers may book travel itineraries that allow for regional sightseeing if cost is equivalent within a reasonable margin to the basic home-to-workshop itinerary (some trainers may travel from home locations other than Boston). If there are any funding rules that may constrain options, this must be communicated in advance of the workshop scheduling being finalized.

c. The host is encouraged to book accommodation on behalf of the trainers, but this is not required.

d. The host will be invoiced for trainer expenses by the Broad Institute. If individual expense claims are required by funding rules, this must be communicated in advance of the workshop scheduling being finalized.

e. The host is encouraged to make the workshop free of charge for participants, but may charge participants a reasonable fee to recoup expenses. If the fee is expected to be higher than $100 USD this must be communicated in advance of the workshop scheduling being finalized.

2. Registration and participation

a. The host is responsible for organizing registration and must provide an online registration page that can be linked from the Broad Institute team’s events calendar.

b. A minimum of 30 participants is required, with a cap of 40 participants maximum.

c. The host must open the workshop to all comers, but may prioritize participants from their own institution if demand is higher than capacity.

d. If registrations overflow the workshop cap, a waitlist must be implemented and all efforts should be made to ensure that the maximum number of participants is achieved.

e. The host is encouraged to advertise the workshop widely. The Broad Institute may also advertise the workshop through an events calendar, blog posts and social media.

3. Venue and IT setup

a. The host is responsible for providing an appropriate venue; typical minimum is a classroom setting with projection capabilities. Some auditoriums may be acceptable depending on configuration; this should be discussed in advance of the workshop scheduling being finalized.

b. Use of a computer lab with workstations that can be pre-configured is encouraged but not required (workshops can be done with the participants’ own laptops). If a computer lab will be used, the host will be responsible for configuring the workstations based on the instructions provided by the training crew in advance of the workshop. If participants are asked to come with laptops, the host will be responsible for communicating the pre-workshop preparation instructions provided by the training crew in advance of the workshop.

c. Operating systems that are supported explicitly include MacOSX and most flavors of Linux. MS Windows is NOT explicitly supported but may be acceptable through the use of Docker containers. If this is a concern, it should be discussed in advance of the workshop scheduling being finalized.

4. Refreshments, meals and social activities

a. The host must arrange for coffee/tea and small snacks to be served during mid-session breaks. Catered lunches are encouraged but not required.

b. The host is encouraged to arrange at least one social activity (e.g. pub outing) to promote networking among participants. Our exit surveys suggest participants value such activities very highly. The arrangement can be very informal; the host is not expected to pay for any costs.

c. The host is welcome to arrange an activity involving the training crew and a local research group to provide the opportunity for the local group to consult with the trainers on specific projects and to foster potential collaborations.

5. Recordings and collateral materials

a. The host is welcome to record workshop sessions for both internal and public use. If the recordings will be posted publicly (e.g. on YouTube) this should be discussed in advance so that the Broad Institute team can provide appropriate guidance and permissions regarding e.g. descriptions, credit attributions and use of logos.

b. The host may circulate copies of the materials provided by the training crew (presentation slides, handouts, test data etc.) beyond the workshop cohort provided the circulated materials are accompanied by appropriate statements of attribution and any web links provided by the Broad Institute team.

Post edited by Geraldine_VdAuwera on
Tagged:
Sign In or Register to comment.