GATK licensing moves to direct-through-Broad model -- read about it on the GATK blog

How can I use parallelism to make GATK tools run faster?

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,565Administrator, GATK Developer admin
edited April 2013 in FAQs

This document provides technical details and recommendations on how the parallelism options offered by the GATK can be used to yield optimal performance results.


As explained in the primer on parallelism for the GATK, there are two main kinds of parallelism that can be applied to the GATK: multi-threading and scatter-gather (using Queue).

Multi-threading options

There are two options for multi-threading with the GATK, controlled by the arguments -nt and -nct, respectively, which can be combined:

  • -nt / --num_threads
    controls the number of data threads sent to the processor
  • -nct / --num_cpu_threads_per_data_thread
    controls the number of CPU threads allocated to each data thread

For more information on how these multi-threading options work, please read the primer on parallelism for the GATK.

Memory considerations for multi-threading

Each data thread needs to be given the full amount of memory you’d normally give a single run. So if you’re running a tool that normally requires 2 Gb of memory to run, if you use -nt 4, the multithreaded run will use 8 Gb of memory. In contrast, CPU threads will share the memory allocated to their “mother” data thread, so you don’t need to worry about allocating memory based on the number of CPU threads you use.

Additional consideration when using -nct with versions 2.2 and 2.3

Because of the way the -nct option was originally implemented, in versions 2.2 and 2.3, there is one CPU thread that is reserved by the system to “manage” the rest. So if you use -nct, you’ll only really start seeing a speedup with -nct 3 (which yields two effective "working" threads) and above. This limitation has been resolved in the implementation that will be available in versions 2.4 and up.


For more details on scatter-gather, see the primer on parallelism for the GATK and the Queue documentation.

Applicability of parallelism to the major GATK tools

Please note that not all tools support all parallelization modes. The parallelization modes that are available for each tool depend partly on the type of traversal that the tool uses to walk through the data, and partly on the nature of the analyses it performs.

Tool Full name Type of traversal NT NCT SG
RTC RealignerTargetCreator RodWalker + - -
IR IndelRealigner ReadWalker - - +
BR BaseRecalibrator LocusWalker - + +
PR PrintReads ReadWalker - + -
RR ReduceReads ReadWalker - - +
UG UnifiedGenotyper LocusWalker + + +

Recommended configurations

The table below summarizes configurations that we typically use for our own projects (one per tool, except we give three alternate possibilities for the UnifiedGenotyper). The different values allocated for each tool reflect not only the technical capabilities of these tools (which options are supported), but also our empirical observations of what provides the best tradeoffs between performance gains and commitment of resources. Please note however that this is meant only as a guide, and that we cannot give you any guarantee that these configurations are the best for your own setup. You will probably have to experiment with the settings to find the configuration that is right for you.

Available modes NT SG NCT,SG NCT SG NT,NCT,SG
Cluster nodes 1 4 4 1 4 4 / 4 / 4
CPU threads (-nct) 1 1 8 4-8 1 3 / 6 / 24
Data threads (-nt) 24 1 1 1 1 8 / 4 / 1
Memory (Gb) 48 4 4 4 4 32 / 16 / 4

Where NT is data multithreading, NCT is CPU multithreading and SG is scatter-gather using Queue. For more details on scatter-gather, see the primer on parallelism for the GATK and the Queue documentation.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Issue · Github
by Geraldine_VdAuwera

Issue Number
Last Updated


  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,565Administrator, GATK Developer admin

    Questions and comments up to August 2014 have been moved to an archival thread here:

    Geraldine Van der Auwera, PhD

  • jacobhsujacobhsu Hong KongPosts: 14Member

    Sorry, i have to post at here in order to make it clearer. I guess I'm a bit confused. Dose parameter -nt act as the same as how many nodes (machines) ? From above information, you got the balance results by 24 nodes(machines) on RTC tool ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,565Administrator, GATK Developer admin

    @jacobhsu‌ That's correct.

    Geraldine Van der Auwera, PhD

  • tommycarstensentommycarstensen United KingdomPosts: 265Member ✭✭

    Any recommended configurations for HaplotypeCaller, CombineGVCFs and GenotypeGVCFs?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,565Administrator, GATK Developer admin

    Not really, to be honest. I've tried to get the engineers to outline some recommendations but they are very reluctant to spit out any numbers. I will try again (it's not stalking if it's part of your job) but in the meantime I would say that trial and error (and lots of systematic testing) is your best bet.

    Geraldine Van der Auwera, PhD

  • intipedrosointipedroso Posts: 1Member

    i am running SplitNCigarReads with --num_threads 1 --num_cpu_threads_per_data_thread 1. I wanted to use 1 CPU and no more. However, as you can see on the line below some times it uses 40 CPUs or more. Why does this happen and how can I actually restrict the CPU usage to 1?

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
     34701 ipedroso  20   0 38,799g 1,104g  12648 S  4037  0,2   7:58.99 java -jar /home/ipedroso/APP/GenomeAnalysisTK.jar --num_threads 1 --num_cpu_threads_per_data_thread 1 -T SplitNCigarReads

    Thanks in advance

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,565Administrator, GATK Developer admin

    @intipedroso This is well outside my scope, but I think I read somewhere that the JVM itself will utilize additional cores even if the application does not request them, so you may need to figure out how to constrain CPU usage by the JVM.

    Geraldine Van der Auwera, PhD

  • KurtKurt Posts: 190Member ✭✭✭

    This comment from Picard FAQ's may be useful (I've never had any interest to play with it myself). You should be able to call it when invoking java (e.g. java -jar --XX:ParallelGCThreads=1). I would see this on some picard programs when I use to look at these things a few years back (it seemed to me that it would spike when trying to write to file, but I could be wrong).

    Q: Why does a Picard program use so many threads?
    A: This can be caused by the GC method of Java when used on 64 bit Java. By default the JVM switches to 'server' settings when on 64 bit, this automatically implements parallel GC and will use as many cores as it can get it's hands on. The approach we decided on to get round this was to define the number of threads we would allow Java for GC.

    An alternative approach is to turn off Parallel Gc (boolean option so note the '-' to indicate it is turned off):

    . We found this to be sub-optimal as the process has to stop completely when GC occurs and takes much longer as (from what I can tell) a full GC sweep is the only type performed which in many cases is not required (parallel GC employs ~7 different types of GC). See here for further details of the tuneable parameters.

Sign In or Register to comment.