Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Are there runtime measurements of the HaplotypeCaller with the PairHMM FPGA accelerator ?

rwkrwk Member
Hello.

I would like to know if there are runtime measurements of the HaplotypeCaller with the "--pairHMM EXPERIMENTAL_FPGA_LOGLESS_CACHING" option against other options such as "--pairHMM FASTEST_AVAILABLE" ?
(which will not even try to use the FPGA as can be seen in the source code of PairHMM.java line 81+)

Or if someone did experiment with this and made the results available somewhere ?

The only results available (or at least I could find) seem to come from synthetic benchmarks and not benchmarks run form the HaplotypeCaller itself in a real-life like situation.

Did anyone run the FPGA version against the fastest available (e.g., AVX) version ?
I do not have access to the cards supported by the current GKL FPGA implementation otherwise I would have done these measurements myself.

While the performance of the FPGA accelerators looks really nice on paper, I am interested in real test cases.
Is there anyone that did run the accelerator with a job in a similar fashion as is done in the "GATK Tutorial :: Germline SNPs & Indels" or (even better) with bigger workloads/datasets ?

Thank you very much.
Regards.
Rick

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @rwk

    I will need to reach out to our dev team to find an answer to this. It might take a couple of days to get back to you on this.

  • rwkrwk Member
    @bhanuGandham thank you very much for your inquiry.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 24

    Hi @rwk

    We do not recommend using this as we have neither tested nor support it.

    PS: Checkout Terra for end-to-end GATK pipelining solutions and let us know what more pipelines we can add that will make using GATK easier for you! For more details on whether this is the right fit for you checkout our blog page.

    Post edited by bhanuGandham on
  • rwkrwk Member
    That's too bad, didn't anyone try it at least ?

    From the intel paper : h ttps://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/wp/wp-accelerating-genomics-opencl-fpgas.pdf

    They say :
    1) They have a much higher performance with the FPGA version than the CPU version.
    2) They integrated the GATK pipeline through the GKL

    However I cannot find any "in-pipeline" benchmarks of this solution.

    From how the Intel FPGA accelerator is called through the GKL (which is open-source) it seems it cannot achieve the "peak-performance" listed in the Intel paper. So I was wondering how an FPGA accelerator would perform in a real test case.

    I did some benchmarks by rewriting the GKL and just simulating data transfer to an FPGA and from an FPGA (DMA transfer batches to an FPGA and DMA transfer results back (hap*read*32bits of data)) without any computation. (Using Amazon F1 FPGA instance, DMA transfer to FPGA card on-board DDR4 and back).

    The time required just to transfer the data (at this granularity, i.e., per "batch") to an accelerator card and getting results back is really close to the time needed for a full AVX + OpenMP computation.
    (Anyway with simple test cases such as in the GATK best-practices HaplotypeCaller workshop).

    This means that with the current way the jobs are issued (i.e., per "batch") it would be very difficult to get good results with any FPGA accelerator for the Pair-HMM algorithm (against AVX + OpenMP).

    I just wanted to know if anyone else observed these kind of results, e.g., by running the FPGA Intel accelerator. Or if I am mistaken somewhere.

    I know you are not supporting and do not recommend using the FPGA accelerators but since there is support for launching tehm in GATK I was wondering if anyone did run some tests and what kind of performance the accelerators could achieve.

    Thank you anyway.
    Regards,
    Rick
  • rwkrwk Member
    I just found the information in the Intel Paper :
    "Upon integration with the GATK Best Practices pipeline, the overall pipeline speed-up was 1.2x compared to the Intel AVX technology implementation."

    The actual speed-up is 1.2x comparing to the AVX implementation, which is almost 1-to-1.

    But this is only a single benchmark, however it seems the FPGA version and the AVX versions would perform almost the same due to the current granularity (accelerator is called per "batch") which seems to be confirmed from the 1.2x factor they got.

    Thank you for your time.
    Regards,
    Rick
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @rwk

    Thank you for posting this information, it will help the community.

Sign In or Register to comment.