We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Are there runtime measurements of the HaplotypeCaller with the PairHMM FPGA accelerator ?


I would like to know if there are runtime measurements of the HaplotypeCaller with the "--pairHMM EXPERIMENTAL_FPGA_LOGLESS_CACHING" option against other options such as "--pairHMM FASTEST_AVAILABLE" ?
(which will not even try to use the FPGA as can be seen in the source code of PairHMM.java line 81+)

Or if someone did experiment with this and made the results available somewhere ?

The only results available (or at least I could find) seem to come from synthetic benchmarks and not benchmarks run form the HaplotypeCaller itself in a real-life like situation.

Did anyone run the FPGA version against the fastest available (e.g., AVX) version ?
I do not have access to the cards supported by the current GKL FPGA implementation otherwise I would have done these measurements myself.

While the performance of the FPGA accelerators looks really nice on paper, I am interested in real test cases.
Is there anyone that did run the accelerator with a job in a similar fashion as is done in the "GATK Tutorial :: Germline SNPs & Indels" or (even better) with bigger workloads/datasets ?

Thank you very much.


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @rwk

    I will need to reach out to our dev team to find an answer to this. It might take a couple of days to get back to you on this.

  • rwkrwk Member
    @bhanuGandham thank you very much for your inquiry.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 2019

    Hi @rwk

    We do not recommend using this as we have neither tested nor support it.

    PS: Checkout Terra for end-to-end GATK pipelining solutions and let us know what more pipelines we can add that will make using GATK easier for you! For more details on whether this is the right fit for you checkout our blog page.

    Post edited by bhanuGandham on
  • rwkrwk Member
    That's too bad, didn't anyone try it at least ?

    From the intel paper : h ttps://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/wp/wp-accelerating-genomics-opencl-fpgas.pdf

    They say :
    1) They have a much higher performance with the FPGA version than the CPU version.
    2) They integrated the GATK pipeline through the GKL

    However I cannot find any "in-pipeline" benchmarks of this solution.

    From how the Intel FPGA accelerator is called through the GKL (which is open-source) it seems it cannot achieve the "peak-performance" listed in the Intel paper. So I was wondering how an FPGA accelerator would perform in a real test case.

    I did some benchmarks by rewriting the GKL and just simulating data transfer to an FPGA and from an FPGA (DMA transfer batches to an FPGA and DMA transfer results back (hap*read*32bits of data)) without any computation. (Using Amazon F1 FPGA instance, DMA transfer to FPGA card on-board DDR4 and back).

    The time required just to transfer the data (at this granularity, i.e., per "batch") to an accelerator card and getting results back is really close to the time needed for a full AVX + OpenMP computation.
    (Anyway with simple test cases such as in the GATK best-practices HaplotypeCaller workshop).

    This means that with the current way the jobs are issued (i.e., per "batch") it would be very difficult to get good results with any FPGA accelerator for the Pair-HMM algorithm (against AVX + OpenMP).

    I just wanted to know if anyone else observed these kind of results, e.g., by running the FPGA Intel accelerator. Or if I am mistaken somewhere.

    I know you are not supporting and do not recommend using the FPGA accelerators but since there is support for launching tehm in GATK I was wondering if anyone did run some tests and what kind of performance the accelerators could achieve.

    Thank you anyway.
  • rwkrwk Member
    I just found the information in the Intel Paper :
    "Upon integration with the GATK Best Practices pipeline, the overall pipeline speed-up was 1.2x compared to the Intel AVX technology implementation."

    The actual speed-up is 1.2x comparing to the AVX implementation, which is almost 1-to-1.

    But this is only a single benchmark, however it seems the FPGA version and the AVX versions would perform almost the same due to the current granularity (accelerator is called per "batch") which seems to be confirmed from the 1.2x factor they got.

    Thank you for your time.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin


    Thank you for posting this information, it will help the community.

Sign In or Register to comment.