We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

From Python Magic to embedded IGV: A closer look at GATK tutorial notebooks

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

Earlier this week, I made a big deal about how we plan to develop all of our GATK tutorials as Jupyter Notebooks in Terra going forward. Today I'd like to offer you a concrete look at what we like about using notebooks for GATK tutorials.

I was planning to just walk you through a couple of notebooks in one of our workshop workspaces, but then decided to make a custom workspace and notebook to show you what I mean without the complexity of the full-length tutorials. It's part highlights, featuring a couple of my favorite tutorial scenarios from the workshops that are fairly simple yet quite effective, and part sneak preview of the newest version of the tutorials, which boast cool new features and will be unveiled at the next workshop (Cambridge in July). Oh, and part explainer on what exactly are Jupyter Notebooks anyway?

Overall you can consider this mini-tutorial a stepping stone to being able to use the workshop tutorial workspaces without needing to actually attend a workshop. The workspace docs and the notebook itself both have a lot of explanations about how things work and how to use them in your pursuit of deeper understanding of GATK. So I don't feel the need to go on and on about it here (for once). But I will mention, in case you're on the fence about whether to spend 5 whole minutes checking out the workspace (add 15 to 20 minutes to actually work through the full notebook), it involves running GATK commands, streaming files, and viewing data in IGV -- all without ever leaving the warm embrace of the notebook.

Actually I lied, I will go on a bit because there are two standout features that I want to call explicitly. One is Python Magic, which allows us to run commands as if we were in the terminal, but from within the flow of the notebook itself. If you thought you could only run Python code in there, think again! You can run anything that you can install on the notebook runtime (which is just about anything). You can also use it to embed R code, which comes in handy in one of our filtering tutorials, because we love Python as a home base but make extensive use of the R library ggplot. (Or you can switch the entire notebook to an R kernel on the fly but that leads to some nervousness about state so I'd rather use the magic, personally.)

The other waffle-worthy feature is IGV integration: you can embed an interactive IGV window to view and explore your data directly from within the notebook. Until very recently we had to load files into desktop IGV, which involved a lot of copy-pasting of cloud storage file paths, and some context switching. With embedded IGV there's none of that. It's not as full-featured as the desktop version (and sometimes you may still prefer to use desktop IGV), but the notebook integration has practically all the functionality I ever use. And it's just so cool to have what amounts to embedded interactive figures right there with the rest of the commands and explanations. Seriously, I love the IGV integration so much, it's hard to put into words.

All this to say, I heartily recommend you check out this mini-tutorial workspace, as it will give you a very concrete set of examples of how we're building out our tutorials and empower you to work through our workshop workspaces on your own. And as always we'd love to get feedback from all of you about the current crop of tutorials and what you'd like us to prioritize next.

Go to http://app.terra.bio and you'll be asked to log in with a Google identity. If you don't have one already, you can create one, and choose to either create a new Gmail account for it or associate your new Google identity with your existing email address. See this article for step-by-step instructions on how to register if needed. Once you've logged in, look for the big green banner at the top of the screen and click "Start trial" to take advantage of the free credits program. As a reminder, access to Terra is free but Google charges you for compute and storage; the credits (a $300 value) will allow you to try out the resources I'm describing here for free. To clone a workspace, open it, expand the workspace action menu (three-dot icon, top right) and select the "Clone" option. In the cloning dialog, select the billing project we created for you with your free credits. The resulting workspace clone belongs to you. Have fun!

Sign In or Register to comment.