To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Conference Talk (CPPCON) 2014 Sep 8: The Gamgee library for genomics data processing and analysis

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited October 2014 in Archive

Mauricio Carneiro presented this talk at CPPCON (C++ conference) in Bellevue, WA on September 8, 2014. His slide deck and a link to the video are available at this link if you're viewing this post in the forum, or below if you are viewing the presentation page already.


Our group has defined the standards for DNA and RNA sequencing data processing and analysis for disease research and clinical applications. In the last 5 years we have published our tools in the GATK (genome analysis toolkit) which is completely written in java. With the scaling of next generation sequencing and the immense amount of that needs to be processed we hit a performance wall and found ourselves limited by the language to make optimizations and rewrite the algorithms in a way that would conform better to modern hardware.

Enter Gamgee. A free and open source C++14 library that offers much of the functionality of the GATK framework with the performance necessary to scale to the hundreds of petabytes of todays complex diseases projects. We will show how the tools developed using the Gamgee library replaced legacy java GATK tools in the production pipeline of the Broad Institute. We will also talk about how the algorithms have changed to take advantage of the native libraries and modern hardware features such as SSE/AVX and GPUs.

Post edited by Geraldine_VdAuwera on


This discussion has been closed.