Interactive map of the global GATK user community
This morning, we unveiled an interactive GoogleMap, based on anonymized IP addresses collected from the forum database, that shows how the GATK user community is distributed across the globe. Check out Boston/Cambridge!
For the record, this was originally inspired by the World Map of High-throughput Sequencers by James Hadfield (Cancer Research UK, Cambridge) and Nick Loman (University of Birmingham).
As several people have already expressed interest in how this map was put together, I thought I'd give a brief overview of the technical side below the fold. I'm happy to provide more details and/or code if anyone wants to do something similar.
Making the map
Problems and perspectives
The map represents data from ~25,000 registered users out of the ~36,000 total. This does mean we're missing a sizeable chunk of the community, and it's because many IPs were either not in the free version of the geolocation database that I used, or the location was not associated with a name. Funny enough, due to a bug in the first version of my script, the unnamed records were all getting assigned to a single pair of coordinates, so the map was showing over 8,000 GATK users way out in a remote part of Australia. Had me wondering whether the Garvan Institute was hiding a massive secret facility out there!
In a future iteration I think I can salvage the unnamed records by consolidating based on coordinates instead of City+Country name, the choice of which in hindsight was not a great design decision. Right now I'm also not handling correctly any cases where the same city name exists in several states within the United States -- as a born-and-bred European person, I assumed that the City+Country name pair is unique, but now that I think about it, it doesn't hold true in the USA, does it... I don't think this would affect a large number of records, but hey, we care about accuracy, so further refinements will be forthcoming!