Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Get free credits worth $250 for running GATK4 pipelines in the cloud!
With the GATK4 release just around the corner, we wanted to make it easy for everyone to try out the new pipelines without going through a whole lot of setup. So we're setting them all up in ready-to-run workspaces on FireCloud, which is a secure, freely-accessible, open-source analysis portal we built on Google Cloud (think Galaxy but more scalable). The pipelines are preconfigured according to our Best Practices, so it'll be just a matter of a few clicks to run any pipeline you like on the preloaded example datasets -- or, with a few more (simple) steps, to run them on your own data. All this without ever touching a command line, unless you're the CLI-over-GUI type, in which case you're welcome to use the FireCloud APIs vis Swagger or the FISS Python bindings to do all this programmatically.
But that's not all -- we're super excited to announce that we're giving out free credits for running the pipelines! Normally you would have to pay Google for the compute and storage costs -- we make the portal and tools available for free, but Google runs the machines, and they charge you for what you use. However, if you apply ASAP, you can get $250 worth of credits for free! That should be more than enough to test out the new pipelines; with that amount of credits you should be able to get real work done toward your research. And you can run any pipelines you want as long as they're written in WDL, so you can run other tools besides GATK.
The FireCloud free credits program starts January 9th, 2018, when GATK4 is released and the new pipelines are made available in FireCloud. We have secured funding to give out $250 worth of credits each to 1,000 people. Credits will be allocated on a first come, first serve basis, so the sooner you sign up, the more likely you are to receive credits.
To take advantage of this unique opportunity, all you need to do is register for an account on the FireCloud portal (which is itself always free and open to all) and sign up for the free credits program. Read on below for details about signing up and which pipelines will be featured.
Instructions for signing up
Follow these three steps:
Register for an account in FireCloud. This requires a Google identity, but don't worry -- if you don’t have a Gmail account to use, you can link your current email address to a Google identity by following these instructions.
Apply here with the email address you used to register for FireCloud, so we can assign you the free credits. You will receive an email when the free credits are available in FireCloud.
When you log into FireCloud on or after January 9, you will see a banner inviting you to start your free trial. Simply click the "Start" button in the banner to get started.
Full terms and conditions will be posted in the FireCloud documentation and will be repeated to you before you commit to anything.
Note that we reserve the right to accept or reject applications for credits at our discretion. This is intended to protect against any attempts to abuse the credits program, since our goal here is to give as many people as possible the opportunity to try out the pipelines.
Do the credits expire? What happens next?
You will have two months (60 days) to use the free credits from the date you click the “Start” button in FireCloud. We encourage you to begin by checking out the GATK4 workflows in the preloaded workspaces, which will be linked individually on the GATK4 launch page and featured prominently within FireCloud itself. Documentation for the pipelines will include cost estimates for each of the preloaded examples so you can gauge how far your credits will go depending on what you choose to run.
There is no obligation to continue using FireCloud after your free credits expire, and you will be presented with options to save any work you got done during that time.
Which GATK4 pipelines will be featured?
We are preparing individual workspaces for each of the major use cases that will be supported in GATK 4.0:
- Germline short variants: the HaplotypeCaller / joint calling workflows for germline SNPs and indels
- Somatic short variants: the Mutect2 workflow for somatic SNPs and indels
- Somatic copy number: the GATK4 CNV and ACNV workflows for somatic CNV and allelic CNV discovery (including PON creation)
We foresee additional workflows being released in the first quarter of 2018 to cover germline copy number variation (GATK4 gCNV), pathogen sequence detection (PathSeq) and deep learning applied to short variant filtering (GATK4 CNN). These will be added to FireCloud when the corresponding GATK software is released.
Each of these pipelines will be set up in a dedicated workspace in FireCloud, along with complementary workflows (mapping and pre-processing plus commonly-requested format conversions) and appropriate publicly-accessible example data for running pipeline tests. All workflow scripts (WDL) will be viewable and exportable, with inputs and outputs fully configured according to our Best Practices recommendations.
Getting started with GATK4 in FireCloud
Instructions will be provided for getting started quickly and painlessly, including short videos showing step by step how to run the workflows and how to bring in your own data. We're still putting the finishing touches on the Quick Start guide, which will go live shortly before the GATK4 release on Jan 9, but in the meantime you can check out this video for a preview.
We hope you will take advantage of these credits to take GATK4 out for a spin! If you have any questions about the free credits, please comment in the discussion thread below.
Who is paying for the credits? What does Broad get out of it?
Our friends at Google Cloud Platform are generously footing the bill for this credits program. We at the Broad Institute are not getting any share of any revenue that may be generated by GCP as a result of this program. By that I mean that if you continue using Google Cloud for your work on your own dime after you have exhausted your credits, we will not get a cut of the money you pay to Google.
For us (the GATK team), the FireCloud portal and cloud-based platforms in general present an unparalleled opportunity to make our tools available in a format that is much easier to support, since it removes a lot of the complexity involved with dealing with lots of different local infrastructures. The more people use this kind of platform to run our pipelines, the easier it becomes for us to help ensure that the pipelines are running smoothly and correctly for everyone. We are very aware that to many of you, moving your work to the cloud is a big logistical and cultural shift, so we hope that this program will grease the wheels and make it easier for you to try the cloud (and GATK4 itself) on for size. If you find it doesn't suit you, you'll still be able to go back to the traditional method of downloading the software and deploying it on your own infrastructure.