We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Why was Java chosen as the language for picard-tools (way back at the beginning)?

I'm interested in hearing some software development history of GATK. I find that when I know some history of a tool's develoment I can understand it better and work more efficiently with it. I would like to hear what factors were at play during the early days.



  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @deena_b

    Take a look at the blogs we publish about GATK tools and the story behind them here: https://software.broadinstitute.org/gatk/blog

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @deena_b One big reason is that the GATK was initially much more of a software development project and less of a research project. We (I'm talking about ~2010, which is a few years before I joined) hired people from programming backgrounds who then learned a smattering of statistics and population genetics as necessary. Java was a natural choice for this group. Over time we have reversed this so that most of our hires are scientists who, while already experienced with programming, usually need to be brought up to speed with good software engineering practices. Python might be a more natural choice if we were starting over, although Java does have very significant advantages for deployment since the dependencies are bundled into the jar. Some GATK old-timers told me about experiments with Python in the early days that led to huge headaches with package management.

    So now we continue to write the GATK in Java, although most machine learning code is in Python under the hood with a Java wrapper. Our non-GATK tools are all Python.

    About six years ago there was a big effort to move to C++, which didn't work mainly because we had one C++ fanatic, and a few people who thought it was okay.

Sign In or Register to comment.