Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

RealignerTargetCreator appears to take more time when multithreaded using the -nt flag

ClareClare Posts: 5Member
edited August 2012 in Ask the team

Hi all,

We're doing some analysis on quite big data and time is an issue, so I did a bit of scaling testing on a subset of the data before beginning. The results were unexpected.

When I run GATK RealignerTargetCreator with -nt 8 and give it 8 cores to work with, it actually takes about 2.5 times LONGER than if I just run it single-threaded. I don't mean that the user or CPU time goes up - the real, walltime goes up. In the -nt 8 case, the 8 cores would have been on a single node of our cluster with shared memory.

I tried testing on two different kinds of subsets of the data and both performed worse when multithreaded. I first tried restricting the input data by genomic region, ie just analysing chr22. When multithreading didn't seem to be working as expected in this test, I thought that maybe GATK was trying to parallelise over genomic regions, so I instead tried testing on a single lane of input data (a 9.6G bam file spread over the whole genome). This also ran more slowly when multithreaded.

So my question is: should I use -nt 8 in my real analysis even though it was a bad option in testing? Is it possible that multithreading will be bad for small amounts of data, but good in the large-data case? Or, does this indicate that I'm doing something wrong when trying to run RealignerTargetCreator multithreaded?

I really would like to use the fastest option for the real data as it will be very big. Any help much appreciated.

Thanks, Clare

Post edited by Carneiro on

Answers

  • ClareClare Posts: 5Member

    I should have said, this is GATK 1.6-7

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GSA Member admin

    This can happen when you are IO limited and by specifying -nt 8 you are just causing the machine's IO system to thrash. We note that in our infrastructure -- which is Isilon backed and so very high throughput -- that -nt 8 is about the max we can use without seeing diminishing returns. Also, RealignerTargetCreator is an extremely inexpensive operation outside of the IO, so it's not easy to get a boost from nt. Have you tried nt 2 or another value? If you are really ambitous you can actually copy the BAM locally or into a ramcache and run multi-threaded against that. It's very much more efficient.

    Also, we have an outline for a more efficient implementation of nt that will do manage IO and CPU parallelism separately, but that's months away

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • ClareClare Posts: 5Member

    Thanks Mark!

    The system we're on actually has very fast IO too, it's designed for life sciences. I did also originally try -nt 6 and it was the worst of the three options, slightly worse than -nt 8.

    However, since making my post I've tried again with a much larger dataset (a different sample though unfortunately) and this time the multithreaded (8 core) run was a lot faster. Actually, the single-thread run hasn't finished yet, but the 8-core run has finished the ~400GB bam file between yesterday and today. So this is great, but I can't explain my earlier test results which I'm pretty certain of. Maybe it really is low coverage that makes multithreading inefficient?

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GSA Member admin

    That's good. The current multi-threading is so much inefficient as only effective when the IO costs are small relative to the compute costs. If there's some reason that the IO is slower you can thrash the IO when nt and make everything worse.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

Sign In or Register to comment.