GATK by the numbers
For my 10,000th posting on this forum, I thought I'd pull together a few numbers.
First, yes I did just say this is my 10,000th post. That breaks downs to 464 new discussions (documentation articles and blog posts, including this one) and 9536 comments in various threads. I've also posted over 1,000 tweets as @gatk_dev on behalf of the development team. When people ask what I do I can say with a straight face that I'm a scientist and I tweet for a living, it's awesome.
But hey, here are some more important numbers.
- Forum and website (since 2012): 35,000 registered users; 3,000 active participants; 6,000 discussions; 20,000 comments; 50,000 page views weekly; 8,000,000 page views total.
- Codebase: 23 version; 59 contributors; 14,000 commits ; 500,000 lines.
- Usage: 5,000,000 CPU days; 800,000,000 jobs; 30,000 distinct users.
Forum and website (since 2012)
The forum community includes just shy of 35,000 registered users. Among these, an active subset of about 3,000 have posted over 6,000 discussions and over 20,000 comments. That's not counting mine; and soon I'll have to start subtracting @Sheila's since she has taken over my day-to-day forum duties and is racking up quite a post count herself.
Between the forum and the documentation, we typically get about 50,000 page views per week (showing a neat Monday-Friday hill pattern -- good on you for having a life on the weekends, people!) totaling over 8,000,000 page views since the launch of the website in 2012.
Now let's talk about development activity. Looking at just the "classic" GATK codebase (not GATK4), there have been 23 released versions of GATK (1.x through 3.x, not counting point releases such as 3.4-46). We've had 59 contributors (mostly internal but some external) who made over 14,000 code commits. Number of lines of code is a reaaaally controversial metric, but if you must know, we estimate that the GATK3 codebase has about 500,000 lines of code, excluding license text lines but including code comments which are important and should totally count.
The GATK has been run for at least 5,000,000 days' worth of of runtime (that's over 13,000 years -- and it's not just HaplotypeCaller being a bit slow) over 800,000,000 separate jobs by 30,000 distinct users, as reported by the GATK's Phone Home system. I believe that's not counting the Broad's own usage on 250,000 genomes and exomes -- and it's certainly not counting anyone running GATK offline, behind a firewall or with a NO_ET key.
Now, as I mentioned in the GATK 3.6 version highlights, we removed Phone Home in the latest version, so we'll no longer be getting usage information from it or any future versions -- but I am holding out hope that enough people will be running older versions for a little while to get us to clocking a billion jobs. Because a billion, well, that's a cool number right there. But don't take this as an encouragement not to update to 3.6 as soon as possible!