This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Goodbye note to the GATK community
The other day I decided to disassemble the bathroom doorknob. Efforts included chipping away layers of paint and recruiting some muscle to remove the screws the chipping had revealed. When I levered out the latch system and took it apart, I noticed two things. First, parts of it had a beautiful copper color. Second, the internal spring was broken into three parts. The latter explained the sticky latch you had to jiggle to stay closed and a door that had started to pop open randomly as if possessed.
I made the fix with a new spring.
It may be a compulsion to want to fix broken things. I think it stems from the same curiosity that makes you want to take things apart to understand how they work . When I joined the GATK some 3.5 years ago as a technical writer, this compulsion surfaced and drove me to the effort that resulted in the pieces I wrote. Below, at the end of this post, is a sampling of my most viewed articles.
I would like to thank you for allowing me to serve you these past years. I have learned much in the process. The knowledge I have gained in genomics comes not only from these writing projects but also just as much from answering your questions on the forum. From holding to answering just one forum question a day, I am proud to have earned over 250 likes and points aplenty for the forum’s five-star ranking.
Looking back, 2018 was a busy year. Geraldine asked I help out at the July Cambridge UK workshop and also at the December Taiwan workshop . Each workshop brings with it a torrent of activity creating and updating materials. It is always insightful and rewarding to interact firsthand with researchers, to hear about sticking points and to see reactions to the tutorials I develop and write.
Since returning from the December workshop, I have been submarined pouring effort into finalizing the gCNV tutorial in time for my departure. I hope you find it useful. This tutorial has been the most challenging to develop so far in that exploring the results involved more creative solutions than usual, as you will see in the tutorial’s companion Jupyter Notebook reports here and here .
Before I start searching for a new job, this month I will spend some time visiting friends and family and remembering my Ph.D. advisor at his memorial. If you would like to lend your support, I would love to have your endorsement on LinkedIn . If you need to get in touch with me, please ping me on GitHub, in the broadinstitute/gatk repository. My handle is @sooheelee and I will be checking in intermittently.
It has been a privilege.
 This curiosity should not be surprising in someone who once walked the life of a Ph.D. biochemist. And it should be expected from someone whose folks include a plant pathologist (Dad studied in North Dakota) and a WWII pilot turned aeronautical engineer (Mr. Cummings served in the Army Air Corp; he is turning 95 this May and I will be seeing him for his birthday). Each of my families tells me I’m molded from the same clay as my fathers.
 No, this is not an April Fools' joke.
 There were two Taiwan workshops in 2018. The video footage of the December 2018 Taiwan workshop is not posted anywhere else, and so here is the link: https://drive.google.com/drive/folders/1-uMoz-ui5IteriKngee7Vic9AWAcnfcL.
 I have become a fan of pandas the software but also the animal.
 Connect with me, and, if you feel like it, please endorse my skills in Genomics.
A sampling of my most popular articles grouped by year and sorted by number of views
|Year||Views||Article# and link||Title|
|2015||26.1K||6484||(How to) Generate an unmapped BAM from FASTQ or aligned BAM|
|.||17.2K||6483||(How to) Map and clean up short read sequence data efficiently|
|2016||17.9K||6747||(How to) Mark duplicates with MarkDuplicates or MarkDuplicatesWithMateCigar|
|.||8.2K||7857||Reference Genome Components|
|.||7.8K||8017||(How to) Map reads to a reference with alternate contigs like GRCh38|
|.||4.9K||7847||Changing workflows around calling SNPs and indels|
|.||4.4K||7156||(howto) Perform local realignment around indels|
|.||3.5K||7899||Reference implementation: PairedEndSingleSampleWf pipeline|
|.||2.2K||6926||Spanning or overlapping deletions (* allele)|
|.||2.0K||8180||9 Takeaways to help you get started with GRCh38|
|.||1.9K||7859||(How to) Simulate reads using a reference genome ALT contig|
|.||1.1K||7019||Sam flags down a boat|
|2017||22.7K||9143*||(How to) Call somatic copy number variants using GATK4 CNV|
|.||2.9K||9183*||(How to) Call somatic SNVs and indels using MuTect2|
|.||2.2K||10172||(How to) Run the GATK4 Docker locally and take a look inside|
|.||1.7K||10911||Differences between GATK3 MuTect2 and GATK4 Mutect2|
|.||1.1K||10060||(How to) Run FlagStatSpark on a cloud Spark cluster|
|2018||18.5K||11136||(How to) Call somatic mutations using GATK4 Mutect2|
|.||3.0K||11682||(How to part I) Sensitively detect copy ratio alterations and allelic segments|
|.||2.6K||11127||Somatic calling is NOT simply a difference between two callsets|
|.||2.0K||11683||(How to part II) Sensitively detect copy ratio alterations and allelic segments|
|.||938||12350||(How to) Filter on genotype using VariantFiltration|
|.||740||11315||Off-label workflow to simply call differences in two samples|
|.||~||23216||(How to) Filter variants either with VQSR or by hard-filtering|
|2019||~||11684||(How to) Call common and rare germline copy number variants|
|.||~||11685||(Notebook) Concordance of NA19017 chr20 gCNV calls|
|.||~||11686||(Notebook) Correlate gCNV callset metrics and annotations|
|.||~||11687||After gCNV calling considerations|
*Uses older versions of tools that have been replaced.
~Published in the last three months.