What does each data thread stand for in HaplotypeCaller
I'm using multi-threading for HaplotypeCaller by setting the nct option.
But actually, I found that the speedup it gains isn't in proportional to the increase of the number of data threads.
I tried nct as 8,12,16,24 on my machine, and gained a speedup of 4.1x, 4.2x, 4.2x, 4.2x. Seems that there is an upper bound of performance gains when enabling mult-threading for HaplotypeCaller.
I'm wondering what each data thread stands for in HaplotypeCaller. We need to use PairHMM to calculate the likelihood array in each active region. Are we distributing each read-haplotype pair in the region as one data thread and map it to a CPU thread? Or are we distributing the calculation in each region as one data thread?