I'm new to GATK. Where do I start?
If this is your first rodeo, you're probably asking yourself:
What can GATK do for me?
Identify variants in a bunch of sample sequences, with great sensitivity and specificity.
No but really, how do I know what to do?
For each step in the Best Practices, there is a tutorial that details how to run the tools involved, with example commands. The idea is to daisy-chain all thosee tutorials in the order that they're referenced in the Best Practices doc into a pipeline.
Oh, you mean I can just copy/paste all the tutorial commands as they are?
Not quite, because there are a few things that need to be tweaked. For example, the tutorials use the
-L/--intervalsargument to restrict analysis for demo purposes, but depending on your data and experimental design, you may need to remove it (e.g. for WGS) or adapt it (for WEx). Hopefully it's explained clearly enough in the tutorials.
Why don't you just provide one script that runs all the tools?
It's really hard to build and maintain a one-size-fits-all pipeline solution. Really really hard. And not nearly as much fun as developing new analysis methods. We do provide a pipelining program called Queue that has the advantage of understanding GATK argument syntax natively, but you still have to actually write scripts yourself in Scala to use it. Sorry. Maybe one day we will be able to offer GATK analysis on the Cloud. But not today.
What if I want to know what a command line argument does or change a parameter?
First, check out the basic GATK command syntax FAQ if it's your first time using GATK, then consult the relevant Tool Documentation page. Keep in mind that some arguments are "engine parameters" that are shared by many tools, and are listed in a separate document. Also, you can always use the search box to find an argument description really quickly.
The documentation seems chaotic. Is there any logic to how it's organized?
Sort of. (And, ouch. Tough crowd.) The main category names should be obvious enough (if not, see the "Documentation Categories" tab). Within categories, everything is just in alphabetical order. In future, we're going to try to provide more use-case based structure, but for now this is what we have. The best way to find practical information is to either go from the Best Practices doc (which provide links to all FAQs, method articles and tutorials directly related to a given step), or use the search box and search-by-tag functions (see the "Search tab"). Be sure to also check out the Presentations section, which provides workshop materials and videos that explain a lot of the motivation and methods behind the Best Practices.
Does GATK include other tools beside the ones in the Best Practices?
Oh sure, there's a whole bunch of them, all listed in the Tool Documentation section, categorized by type of analysis. But be aware that anything that's not part of the Best Practices is most likely either a tool that was written for a one-off analysis years ago, an experimental feature that we're still not sure is actually useful, or an accessory utility that can be used in many different ways and takes expert inside knowledge to use properly. All these may be buggy, insufficiently documented, or both. We provide support for them as well as humanly possible but ultimately, you use them at your own risk.
Why do the answers to these questions keep getting longer and longer?
I don't know what you're talking about.
What else should I know before I start?
You should probably browse the titles of the Frequently Asked Questions -- there will be at least a handful you'll want to read, but it's hard for us to predict which ones.