The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

RealignerTargetCreator appears to take more time when multithreaded using the -nt flag

ClareClare Member
edited August 2012 in Ask the GATK team

Hi all,

We're doing some analysis on quite big data and time is an issue, so I did a bit of scaling testing on a subset of the data before beginning. The results were unexpected.

When I run GATK RealignerTargetCreator with -nt 8 and give it 8 cores to work with, it actually takes about 2.5 times LONGER than if I just run it single-threaded. I don't mean that the user or CPU time goes up - the real, walltime goes up. In the -nt 8 case, the 8 cores would have been on a single node of our cluster with shared memory.

I tried testing on two different kinds of subsets of the data and both performed worse when multithreaded. I first tried restricting the input data by genomic region, ie just analysing chr22. When multithreading didn't seem to be working as expected in this test, I thought that maybe GATK was trying to parallelise over genomic regions, so I instead tried testing on a single lane of input data (a 9.6G bam file spread over the whole genome). This also ran more slowly when multithreaded.

So my question is: should I use -nt 8 in my real analysis even though it was a bad option in testing? Is it possible that multithreading will be bad for small amounts of data, but good in the large-data case? Or, does this indicate that I'm doing something wrong when trying to run RealignerTargetCreator multithreaded?

I really would like to use the fastest option for the real data as it will be very big. Any help much appreciated.


Post edited by Carneiro on


  • I should have said, this is GATK 1.6-7

  • Mark_DePristoMark_DePristo Broad InstituteMember

    This can happen when you are IO limited and by specifying -nt 8 you are just causing the machine's IO system to thrash. We note that in our infrastructure -- which is Isilon backed and so very high throughput -- that -nt 8 is about the max we can use without seeing diminishing returns. Also, RealignerTargetCreator is an extremely inexpensive operation outside of the IO, so it's not easy to get a boost from nt. Have you tried nt 2 or another value? If you are really ambitous you can actually copy the BAM locally or into a ramcache and run multi-threaded against that. It's very much more efficient.

    Also, we have an outline for a more efficient implementation of nt that will do manage IO and CPU parallelism separately, but that's months away

  • Thanks Mark!

    The system we're on actually has very fast IO too, it's designed for life sciences. I did also originally try -nt 6 and it was the worst of the three options, slightly worse than -nt 8.

    However, since making my post I've tried again with a much larger dataset (a different sample though unfortunately) and this time the multithreaded (8 core) run was a lot faster. Actually, the single-thread run hasn't finished yet, but the 8-core run has finished the ~400GB bam file between yesterday and today. So this is great, but I can't explain my earlier test results which I'm pretty certain of. Maybe it really is low coverage that makes multithreading inefficient?

  • Mark_DePristoMark_DePristo Broad InstituteMember

    That's good. The current multi-threading is so much inefficient as only effective when the IO costs are small relative to the compute costs. If there's some reason that the IO is slower you can thrash the IO when nt and make everything worse.

Sign In or Register to comment.