ApplyBQSRSpark Hyperthreading slowdown

djwhiteepccdjwhiteepcc Member
edited February 4 in Ask the GATK team
Hi there!

We have been performing some benchmarking of GATK4 Spark tools across a couple of our infrastructures, as part of BioExcel, and have noticed an interesting issue. When performing ApplyBQSRSpark on the same system, comparing hyperthreading vs non-hyperthreading, we find that hyperthreading is slower (i.e. 72 hyperthreads take longer to run than 36 non-hyperthreads). This is opposite to what we expect, and what we see with BaseRecalibratorSpark.

We have put these preliminary results on a blog post:

Figure 2 shows this best. In general, hyperthreaded execution takes roughly twice as long. Can anyone think of a reason why this might be? It's hard to fully understand the source code given that the pair of us working on this the most are neither from a genomics nor a Java background!

We also have a question about what is actually being executed with ApplyBQSRSpark. Looking at the SparkUI, we see that three stages are performed: stages 1 and 2, and then a final merge of the output file. We have no idea what actually happens in stages 1 and 2. Could anyone shed light on this? Does stage 1 distribute the data and stage 2 apply the recalibration, or is it something else?

While the version of GATK we used is quite old compared to the latest release, we had to fix a version as we had no time left to re-run our benchmarks on the new versions. It is also interesting to understand even if it is due to an old version of the code.

Your help is greatly appreciated.


GATK4 v.
Cirrus: Each node - 36 cores, 72 hyperthreading, 256GB memory, Lustre file system
Spark v2.3.1


  • steve1steve1 Member

    Hey just curious, do you see this with other programs or just this one? Do you know which CPU models you are using?

  • djwhiteepccdjwhiteepcc Member
    Hey Steve,

    As far as we can tell, just ApplyBQSRSpark. BaseRecalibratorSpark executes faster with hyperthreading enabled, as shown in the blog post. It just seems flipped for ApplyBQSRSpark. We can't work out if it is a GATK thing, a Spark thing, and IO thing, so wondered if anyone had either seen the same behaviour or had an idea of what the cause might be.

    Each standard Cirrus compute node has 256 GiB of memory and contains two 2.1 GHz, 18-core Intel Xeon (Broadwell) processors. More info can be found here:


  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Hi @djwhiteepcc

    Thank you for pointing out this benchmarking issue, I have let the GATK development team know.

    Of course, using the most recent version of the code and retesting it would be important, do you think you will be able to do that?

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin


    The development team recommended upgrading to the most recent version of GATK to see if you can replicate this error.

Sign In or Register to comment.