We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Goodbye note to the GATK community

shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
edited April 2019 in Announcements


The other day I decided to disassemble the bathroom doorknob. Efforts included chipping away layers of paint and recruiting some muscle to remove the screws the chipping had revealed. When I levered out the latch system and took it apart, I noticed two things. First, parts of it had a beautiful copper color. Second, the internal spring was broken into three parts. The latter explained the sticky latch you had to jiggle to stay closed and a door that had started to pop open randomly as if possessed.

I made the fix with a new spring.

It may be a compulsion to want to fix broken things. I think it stems from the same curiosity that makes you want to take things apart to understand how they work [1]. When I joined the GATK some 3.5 years ago as a technical writer, this compulsion surfaced and drove me to the effort that resulted in the pieces I wrote. Below, at the end of this post, is a sampling of my most viewed articles.

I would like to thank you for allowing me to serve you these past years. I have learned much in the process. The knowledge I have gained in genomics comes not only from these writing projects but also just as much from answering your questions on the forum. From holding to answering just one forum question a day, I am proud to have earned over 250 likes and points aplenty for the forum’s five-star ranking.

My last day with the DSP Communications Team is April 1, which is today [2]. Rest assured, my teammates and our wonderful methods developers will continue to take excellent care of you.

Looking back, 2018 was a busy year. Geraldine asked I help out at the July Cambridge UK workshop and also at the December Taiwan workshop [3]. Each workshop brings with it a torrent of activity creating and updating materials. It is always insightful and rewarding to interact firsthand with researchers, to hear about sticking points and to see reactions to the tutorials I develop and write.

Since returning from the December workshop, I have been submarined pouring effort into finalizing the gCNV tutorial in time for my departure. I hope you find it useful. This tutorial has been the most challenging to develop so far in that exploring the results involved more creative solutions than usual, as you will see in the tutorial’s companion Jupyter Notebook reports here and here [4].

Before I start searching for a new job, this month I will spend some time visiting friends and family and remembering my Ph.D. advisor at his memorial. If you would like to lend your support, I would love to have your endorsement on LinkedIn [5]. If you need to get in touch with me, please ping me on GitHub, in the broadinstitute/gatk repository. My handle is @sooheelee and I will be checking in intermittently.

It has been a privilege.

Yours truly,

Soo Hee


[1] This curiosity should not be surprising in someone who once walked the life of a Ph.D. biochemist. And it should be expected from someone whose folks include a plant pathologist (Dad studied in North Dakota) and a WWII pilot turned aeronautical engineer (Mr. Cummings served in the Army Air Corp; he is turning 95 this May and I will be seeing him for his birthday). Each of my families tells me I’m molded from the same clay as my fathers.

[2] No, this is not an April Fools' joke.

[3] There were two Taiwan workshops in 2018. The video footage of the December 2018 Taiwan workshop is not posted anywhere else, and so here is the link: https://drive.google.com/drive/folders/1-uMoz-ui5IteriKngee7Vic9AWAcnfcL.

[4] I have become a fan of pandas the software but also the animal.

[5] Connect with me, and, if you feel like it, please endorse my skills in Genomics.

A sampling of my most popular articles grouped by year and sorted by number of views

Year Views Article# and link Title
2015 26.1K 6484 (How to) Generate an unmapped BAM from FASTQ or aligned BAM
. 17.2K 6483 (How to) Map and clean up short read sequence data efficiently
2016 17.9K 6747 (How to) Mark duplicates with MarkDuplicates or MarkDuplicatesWithMateCigar
. 8.2K 7857 Reference Genome Components
. 7.8K 8017 (How to) Map reads to a reference with alternate contigs like GRCh38
. 4.9K 7847 Changing workflows around calling SNPs and indels
. 4.4K 7156 (howto) Perform local realignment around indels
. 3.5K 7899 Reference implementation: PairedEndSingleSampleWf pipeline
. 2.2K 6926 Spanning or overlapping deletions (* allele)
. 2.0K 8180 9 Takeaways to help you get started with GRCh38
. 1.9K 7859 (How to) Simulate reads using a reference genome ALT contig
. 1.1K 7019 Sam flags down a boat
2017 22.7K 9143* (How to) Call somatic copy number variants using GATK4 CNV
. 2.9K 9183* (How to) Call somatic SNVs and indels using MuTect2
. 2.2K 10172 (How to) Run the GATK4 Docker locally and take a look inside
. 1.7K 10911 Differences between GATK3 MuTect2 and GATK4 Mutect2
. 1.1K 10060 (How to) Run FlagStatSpark on a cloud Spark cluster
2018 18.5K 11136 (How to) Call somatic mutations using GATK4 Mutect2
. 3.0K 11682 (How to part I) Sensitively detect copy ratio alterations and allelic segments
. 2.6K 11127 Somatic calling is NOT simply a difference between two callsets
. 2.0K 11683 (How to part II) Sensitively detect copy ratio alterations and allelic segments
. 938 12350 (How to) Filter on genotype using VariantFiltration
. 740 11315 Off-label workflow to simply call differences in two samples
. ~ 23216 (How to) Filter variants either with VQSR or by hard-filtering
2019 ~ 11684 (How to) Call common and rare germline copy number variants
. ~ 11685 (Notebook) Concordance of NA19017 chr20 gCNV calls
. ~ 11686 (Notebook) Correlate gCNV callset metrics and annotations
. ~ 11687 After gCNV calling considerations

*Uses older versions of tools that have been replaced.
~Published in the last three months.

Post edited by shlee on


Sign In or Register to comment.