It looks like you're new here. If you want to get involved, click one of these buttons!
Geraldine_VdAuwera
Posts: 2,239Administrator, GSA Official Member admin
This document provides technical details and recommendations on how the parallelism options offered by the GATK can be used to yield optimal performance results.
As explained in the primer on parallelism for the GATK, there are two main kinds of parallelism that can be applied to the GATK: multi-threading and scatter-gather (using Queue).
There are two options for multi-threading with the GATK, controlled by the arguments -nt and -nct, respectively, which can be combined:
-nt / --num_threads
controls the number of data threads sent to the processor -nct / --num_cpu_threads_per_data_thread
controls the number of CPU threads allocated to each data threadFor more information on how these multi-threading options work, please read the primer on parallelism for the GATK.
Each data thread needs to be given the full amount of memory you’d normally give a single run. So if you’re running a tool that normally requires 2 Gb of memory to run, if you use -nt 4, the multithreaded run will use 8 Gb of memory. In contrast, CPU threads will share the memory allocated to their “mother” data thread, so you don’t need to worry about allocating memory based on the number of CPU threads you use.
-nct with versions 2.2 and 2.3Because of the way the -nct option was originally implemented, in versions 2.2 and 2.3, there is one CPU thread that is reserved by the system to “manage” the rest. So if you use -nct, you’ll only really start seeing a speedup with -nct 3 (which yields two effective "working" threads) and above. This limitation has been resolved in the implementation that will be available in versions 2.4 and up.
For more details on scatter-gather, see the primer on parallelism for the GATK and the Queue documentation.
Please note that not all tools support all parallelization modes. The parallelization modes that are available for each tool depend partly on the type of traversal that the tool uses to walk through the data, and partly on the nature of the analyses it performs.
| Tool | Full name | Type of traversal | NT | NCT | SG |
|---|---|---|---|---|---|
| RTC | RealignerTargetCreator | RodWalker | + | - | - |
| IR | IndelRealigner | ReadWalker | - | - | + |
| BR | BaseRecalibrator | LocusWalker | - | + | + |
| PR | PrintReads | ReadWalker | - | + | - |
| RR | ReduceReads | ReadWalker | - | - | + |
| UG | UnifiedGenotyper | LocusWalker | + | + | + |
The table below summarizes configurations that we typically use for our own projects (one per tool, except we give three alternate possibilities for the UnifiedGenotyper). The different values allocated for each tool reflect not only the technical capabilities of these tools (which options are supported), but also our empirical observations of what provides the best tradeoffs between performance gains and commitment of resources. Please note however that this is meant only as a guide, and that we cannot give you any guarantee that these configurations are the best for your own setup. You will probably have to experiment with the settings to find the configuration that is right for you.
| Tool | RTC | IR | BR | PR | RR | UG |
|---|---|---|---|---|---|---|
| Available modes | NT | SG | NCT,SG | NCT | SG | NT,NCT,SG |
| Cluster nodes | 1 | 4 | 4 | 1 | 4 | 4 / 4 / 4 |
CPU threads (-nct) |
1 | 1 | 8 | 4-8 | 1 | 3 / 6 / 24 |
Data threads (-nt) |
24 | 1 | 1 | 1 | 1 | 8 / 4 / 1 |
| Memory (Gb) | 48 | 4 | 4 | 4 | 4 | 32 / 16 / 4 |
Where NT is data multithreading, NCT is CPU multithreading and SG is scatter-gather using Queue. For more details on scatter-gather, see the primer on parallelism for the GATK and the Queue documentation.
Geraldine Van der Auwera, PhD
Comments
Hi,
I have access to a cluster with 8 nodes, each node with 64 Gb RAM and 8 cores. I'm trying to process 30 samples using UnifiedGenotyper. Each sample consists of an exome of around 62e6 bases and has an average coverage around 60x. Can you give me any advice on nt/nct configuration in order to optimize the performance of the execution, using and not using Queue?
Thanks, Xavier
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Xavier,
We don't have the resources right now to give case-by-case advice on configurations, so you'll need to experiment with your setup based on the general guidelines in the article. You may want to discuss it with the people who manage your cluster, as they may also have some helpful insights. Good luck!
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •