To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

no. of cores utilization in haplotypcaller in GVCF mode


I am running Haplotypecaller (v4.0.1.2) (not the spark version) on some WGS samples on a SGE (Sun grid Engine) cluster. When I am submitting a job to my cluster, I am asking for 1 core (on an 8 core processor having 1 thread each). I am aware that in native haplotypecaller, I cannot mention the number of cores it should utilize for parallelization and only use --native-pair-hmm-threads to make that step faster (whose default is 4).

Does Haplotypecaller utilize cores according to the availability? I mean if I am assigning 1 core to that job, will it still try to utilize other cores on that processor?

Kindly let me know if you need any more information for clarity.


  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    If you only assign 1 core to the job, only 1 core will be used, regardless of how many cores are available.

    However, can you post your command? You may also find this dictionary entry on Spark useful.


  • prasundutta87prasundutta87 EdinburghMember
    edited February 23


    We have a shared oracle grid engine based cluster computing facility at our university and we submit jobs to it and an example command is this-


    Grid Engine options

    $ -N gvcf_maker_30x

    $ -cwd

    $ -M

    $ -m bea

    $ -t 1:37

    $ -pe sharedmem 8

    $ -l h_vmem=2G

    $ -l h_rt=480:00:0

    Initialise the modules framework

    . /etc/profile.d/

    module load java/jdk/1.8.0

    Using GATK Version:

    animal=head -$SGE_TASK_ID 30x_animals_list.txt | tail -1

    java -Xmx4g -jar gatk-package- HaplotypeCaller -R GCF_000471725.1_UMD_CASPUR_WB_2.0_genomic.fa --native-pair-hmm-threads 8 -I "$animal"_sorted_markduped_readgroup.bam -ERC GVCF -O "$animal".g.vcf

    The problems-

    1) The first problem that is coming is when I am setting --native-pair-hmm-threads 8, haplotypcaller is still using 2 cores instead of all assigned 8 cores. For our system, each core has 1 thread (not 2) and I had an understanding that if I am telling haplotypecaller to use 8 threads for it pairHMM algorithm, it will use all 8 cores. It is not doing that.

    2) The second problem is if I am assigning my qsub script 1 core but am using --native-pair-hmm-threads 16, haplotypecaller is venturing into other cores of the processor jeopardising other jobs sharing the same processor. I expected it still to limit itself to 1 core, which was not the case.

    Is there any explanation to above two cases? Please correct me if I am wrong somewhere in my understanding of how haplotypcaller works with parallelization.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited March 1


    Sorry for the delay. I am asking someone on the team for help and will get back to you soon.


    EDIT: This issue may also interest you.

    Post edited by Sheila on
  • prasundutta87prasundutta87 EdinburghMember

    No problem @Sheila..and thanks for sharing the link...

  • LouisBLouisB Broad InstituteMember, Broadie, Dev

    Hi @prasundutta87. It sounds like maybe your cluster doesn't support the pairhmm multithreading. The multithreading requires some more specific software setup on the cluster than just running the pairHmm with single threaded hardware acceleration. Do you see any output that says something like the following?

    13:01:45.691 WARN  NativeLibraryLoader - Unable to find native library: native/libgkl_pairhmm_omp.dylib
    13:01:45.691 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported

    If so then pairHmm parallelization isn't happening and it's falling back to a single thread. You can force it to run with parallelization or fail with "--pairHMM AVX_LOGLESS_CACHING_OMP". That's a good way to test if if the parallel pairHMM works on your system.

    I'm not certain about the second core utilization. Assuming that pairHMM parallelization is failing, then I'm sure exactly what it is. Java does multithreaded garbage collection which account for additional threading when nothing else is being parallelized. It will use multiple cores if they're available automatically. You can restrict the number of gc threads with -XX:ParallelGCThreads=n or disable it entirely with -XX:-UseSerialGC although I don't recommend that. It seems like it's really the responsibility of the cluster manager to restrict access to certain cores though.

  • prasundutta87prasundutta87 EdinburghMember


    I have this output before gvcf actually starts being made..

    WARNING: We recommend that you use a minimum of 4 GB of virtual memory when running Java 1.8.0_74 on Eddie. Please see the following for details:
    00:10:00.392 INFO NativeLibraryLoader - Loading from jar:file:/exports/eddie3_homes_local/s0928794/tools/gatk-package-!/com/intel/gkl/native/
    00:10:00.631 INFO HaplotypeCaller - ------------------------------------------------------------
    00:10:00.632 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.1.2
    00:10:00.632 INFO HaplotypeCaller - For support and documentation go to
    00:10:00.632 INFO HaplotypeCaller - Executing as on Linux v3.10.0-327.36.3.el7.x86_64 amd64
    00:10:00.632 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_74-b02
    00:10:00.633 INFO HaplotypeCaller - Start Date/Time: 20 February 2018 00:10:00 GMT
    00:10:00.633 INFO HaplotypeCaller - ------------------------------------------------------------
    00:10:00.633 INFO HaplotypeCaller - ------------------------------------------------------------
    00:10:00.634 INFO HaplotypeCaller - HTSJDK Version: 2.14.1
    00:10:00.634 INFO HaplotypeCaller - Picard Version: 2.17.2
    00:10:00.634 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    00:10:00.634 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    00:10:00.634 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    00:10:00.634 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    00:10:00.634 INFO HaplotypeCaller - Deflater: IntelDeflater
    00:10:00.634 INFO HaplotypeCaller - Inflater: IntelInflater
    00:10:00.634 INFO HaplotypeCaller - GCS max retries/reopens: 20
    00:10:00.634 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from
    00:10:00.634 INFO HaplotypeCaller - Initializing engine
    00:10:14.256 INFO HaplotypeCaller - Done initializing engine
    00:10:18.515 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
    00:10:18.515 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
    00:10:19.323 INFO NativeLibraryLoader - Loading from jar:file:/exports/eddie3_homes_local/s0928794/tools/gatk-package-!/com/intel/gkl/native/
    00:10:19.325 INFO NativeLibraryLoader - Loading from jar:file:/exports/eddie3_homes_local/s0928794/tools/gatk-package-!/com/intel/gkl/native/
    00:10:19.380 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    00:10:19.381 INFO IntelPairHmm - Available threads: 16
    00:10:19.381 INFO IntelPairHmm - Requested threads: 8
    00:10:19.381 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation

  • LouisBLouisB Broad InstituteMember, Broadie, Dev

    Hmn, that definitely looks like the threading should be working. I wouldn't expect it to saturated 8 cores, because pairHMM has a diminishing fraction of the total runtime as you add more and more threads, but I would expect to see more than 2 cores used. The pairhmm is only a fraction of the total runtime, so we expect to see diminishing returns as you increase the threading.

    If you want to use your cluster efficiently, agood idea would be to test with different values for threading and see if it gives you a useful speedup. I expect you to get much better cluster utilization by using more separate processes with lower parallelization rather than a few with high parallelization.

    I would try running on your system with threads = 1,2,3,4, 8 and seeing what the runtime is on the same bam. Then you can choose how you shard things based on that result. I would expect reasonable speedup from 1 -> 2 -> 4 but you'll probably see pretty rapid diminishing returns.

  • prasundutta87prasundutta87 EdinburghMember

    Thanks @LouisB..I will try this at my end.

Sign In or Register to comment.