Running HaplotypeCaller with -nct

I am trying to see if I can speed up the HaplotypeCaller tool using the -nct flag. The GATK correctly identifies that my machine has 16 processors, and I specified that the HaplotypeCaller uses 16 threads, i.e. -nct 16. However processing the same file roughly takes the same amount of time (3600sec approx). I tried it also using -nct 8, and -nct 4. None of these options seems to help the process finish faster.

Were there any suggestions or ways I could achieve some appreciable gains? Thank you for any insight anyhow.

    Thank you Geraldine, I will investigate all the above mentioned options, especially regarding the scatter-gather option.

