Speed up HaplotypeCaller on IBM POWER8 systems

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited September 2016 in Announcements

We all know how HaplotypeCaller and Mutect2 analyses can take a long time. IBM is now providing a native implementation of the PairHMM algorithm that leverages the new hardware available in their POWER8 systems. The optimized native library is currently available on POWER8 for the following Linux distributions: Ubuntu 15.10, Ubuntu 16.04 and Red Hat Enterprise Linux 7.1, Red Hat Enterprise Linux 7.2.

To take advantage of the optimized library, you need to do the following:

  • Download the shared library corresponding to your Linux distribution from here
  • Set your java library path to the location of libVectorLoglessPairHMM.so using -Djava.library.path

Here is an example for running HaplotypeCaller on a P8 system with Ubuntu:

export PHMM_N_THREADS=$Num
java -Xmx32g -Djava.library.path=/path/to/PairHMM_P8_Ubuntu -jar $GATK_PATH/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R $REFERENCE -I $INPUT_BAM --dbsnp $SNP_VCF \
-stand_emit_conf 10 -stand_call_conf 50 \
-o $OUTPUT_VCF

Here is an example for running Mutect2 on a P8 system with Ubuntu:

export PHMM_N_THREADS=$Num
java -Djava.library.path=/path/to/PairHMM_P8_Ubuntu -jar GenomeAnalysisTK.jar \
-T MuTect2 \
-R $REFERENCE -L $GENOME_INTERVALS_FILE \
-I:tumor $TUMOR_BAM -I:normal $NORMAL_BAM \
--cosmic $COSMIC_VCF --dbsnp $SNP_VCF \
-o $OUT_VCF

The latest version of the library uses the same floating precision as Java on POWER8, so it generates the same result as without the library. Also, exploiting multithreading along with the SIMD vectorization, it can accelerate HaplotypeCaller and Mutect2 more than the previous version, especially in the single-thread mode (no -nct option specified).

SMT is a processor technology that allows multiple instruction streams (threads) to run concurrently on the same physical processor, improving overall throughput. From the point of view of the operating system, each hardware thread is treated as an independent logical processor. On POWER8 there are SMT8, SMT4, SMT2 and ST mode, each physical processor will have 8, 4, 2 and 1 logical processor, respectively. This pairhmm library uses the number of thread equal to 37% of the available logical processors by default. The number of threads can be tuned by setting the environment variable PHMM_N_THREADS, as shown in above examples.

The library can accelerate HaplotypeCaller, Mutect2 and UnifiedGenotyper of GATK. It can accelerate HaplotypeCaller up to 1.9x and Mutect2 up to 9.26x depending on the test case. For example, if the PairHMM computation consumes about a half of the HaplotypeCaller runtime in single-thread mode, 1.88x speed-up can be expected.

The source code is available here.
If you have any questions or issues (aside from downloading the file), please contact Yinhue Cheng at IBM (ycheng@us.ibm.com) or Takeshi Ogasawara at IBM Japan (TAKESHI@jp.ibm.com).

Disclaimer: Please note that these libraries are not an official IBM product. You use it entirely at your own risk, and neither IBM nor the author assumes any liability whatsoever, nor do they assume responsibility for maintenance. Please report comments and corrections to ycheng@us.ibm.com.

Post edited by Geraldine_VdAuwera on

Comments

  • TechnicalVaultTechnicalVault Cambridge, UKMember

    Isn't --pair_hmm_implementation VECTOR_LOGLESS_CACHING the default mode of operation since 3.2 or is that only for Intel systems that it is a default?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    You're correct that VECTOR_LOGLESS_CACHING is the default for any system that will support it since 3.2. I think we're just being very explicit in the command line here for the sake of clarity.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    The IBM Power8 native libraries for Ubuntu and RHEL have been updated (at the Google drive link above). The authors provided the following information:

    These libraries provide better performance than the previous version. These libraries use double precision therefore the results will be the same as the java only HaplotypeCaller. They also have some tunable parameters that can be used to further improve the performance based on system SMT status.

    Following is a description from Ogasawara-san:
    "This new version of the library accelerates HaplotypeCaller using the same floating precision as Java on POWER8. Also, exploiting Simultaneous Multithreading (SMT) along with the vector instructions, it can accelerate HaplotypeCaller more than the previous version, in particular, in the single-thread mode (no -nct option is specified). For example, the PairHMM computation, which the library accelerates, consumes about a half of the HaplotypeCaller runtime in the single-thread mode, 1.88x speed-up will be expected. The library uses 37% of the available processors by default. The number of threads can be tuned by setting the environment variable PHMM_N_THREADS. The source code is available at https://github.com/t-ogasawara/gatk/tree/vectorPairHMMForPower8."

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Updated the text to provide an example of using this library to accelerate MuTect2 runs.

  • Note that if you are using the IBM Java Virtual Machine v8 for GATK 3.7, there is a bug in the default Just In Time compiler that causes a NullPointerException at org.broadinstitute.gatk.utils.MannWhitneyU.permutationTest(MannWhitneyU.java:562). This can be avoided by explicitly excluding that class from the JIT optimization:

    java -Xjit:exclude={org/broadinstitute/gatk/utils/MannWhitneyU.*} -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I in.bam -o out.vcf

Sign In or Register to comment.