Notebooks get a facelift
By Robert Title, Engineering Manager, Data Sciences Platform at the Broad Institute
We are excited to announce that we have substantially improved the way you interact with Jupyter Notebooks in FireCloud. We hope this will increase your productivity and empower you to collaborate more effectively. These changes are publicly available as of today, Sep 12, 2018, and can be accessed in the Notebooks tab of your FireCloud workspace. Read on for a more detailed description of what is changing, and why it is better.
First, some background. Earlier this year, we announced a beta preview of Jupyter Notebooks in FireCloud. This feature brought the ability to spin up a dedicated cluster (a compute environment based on Google Cloud Dataproc) in your billing project, on which you can run a Jupyter Notebook. Since then, we’ve released several important improvements to the functionality, including the ability to pause/resume clusters, auto-pause to save you money, support for Jupyter extensions, bug/security fixes, new kernels, library upgrades, and more. The overall user experience, however, has essentially remained the same -- until now.
A major limitation of the initial Notebooks Beta release was that it focused on the cluster management and lacked utilities for managing notebooks themselves. In the old system, a notebook was ultimately just a text file that was stored on your cluster running in the cloud. When you deleted your cluster, everything on it -- including notebooks -- was deleted as well. To prevent losing work, you had to download notebooks using the Jupyter UI and store them in some other place, such as a directory on your laptop; we've heard of people copying the contents of their notebooks to a Google Doc! This is error-prone and inconvenient. It also isn't aligned with FireCloud’s philosophy of openness and collaboration, since clusters are not shared with other members of the workspace, and therefore any notebooks that live on them are not shared either.
Now, instead of displaying clusters (which are only visible to you), the new Notebooks tab displays notebooks (which are visible to all members of the workspace). When you work in a notebook, any changes you make are automatically persisted back to the workspace. This enables some powerful new ways of using notebooks in FireCloud. For example, your team can collaborate to develop notebooks containing analysis code, results, and documentation. You can then share your workspace containing notebooks so other researchers can easily reproduce the analysis.
We’re going to post some documentation updates with more detailed instructions on how to use the new Notebooks management interface, though we think it might be intuitive enough that you won't need to read them! Here is a screenshot:
You can Create or Upload a notebook, which adds it to the workspace. You can also Rename/Duplicate/Delete
existing notebooks in the workspace. These operations do not require starting a cluster at all: they simply perform file operations on the notebook files stored in the workspace bucket.
To actually open a notebook and execute code, you need to associate the notebook with a cluster, which you can create with a couple of clicks, or choose from a list of existing clusters -- and yes you can associate multiple notebooks with the same cluster. Once the association is made, the notebook is copied to the cluster and can be opened with Jupyter. We also handle saving any changes you make back to your workspace as you work. There is no more need to upload/download notebook files using the Jupyter UI, although you are still free to do that if you wish. If you do upload a file using the Jupyter UI, we'll save it back to your workspace for you.
Here is a diagram illustrating the above flow using an example workspace containing three notebooks and two clusters.
Following the initial release, here are a few follow-on UI improvements that we’d like to make in the short term:
HTML preview of notebooks
We’d like to add the ability to preview a notebook-in-the-workspace rendered as HTML, without needing to actually launch a cluster. This will allow workspace readers to look at notebooks even if they don’t have can-compute permissions.
JupyterLab & terminal
In addition to Jupyter Notebooks, we would like to provide access to JupyterLab in FireCloud. We’d also like to make it easier to access Jupyter’s in-browser bash terminal.
Additional cluster management options
There are some cluster management features that are not exposed in FireCloud, including configurable Jupyter extensions; auto-pause configuration, and environment customization. We would like FireCloud to make use of these features.
Furthermore, in the longer term we are looking at the following themes to improve our product. The dates are very rough estimates, but they provide some sense of their relative prioritization. For context, Leonardo is the service which provides notebooks functionality to FireCloud, and is where most of the development effort is focused.
More Analysis Tools
We believe we can provide users with other analysis tools besides Jupyter using the Leonardo infrastructure. The next tool we support will most likely be RStudio in Q4 2018, followed by IGV Desktop in Q1 2019.
Bring your own Docker (Q4 2018)
We have some capabilities to customize the notebook environment via a user-provided bash script. We’d also like to support custom Docker images as a more powerful way users can control their notebook environment.
Hail 0.2 support (Q4 2018)
We currently install Hail 0.1 on Leo-created clusters. We would like to upgrade to Hail 0.2 and deprecate 0.1.
Data access (Q1 2019)
Today in a notebook you can access Google Cloud Storage or BigQuery data using standard libraries in python or R. For python users, we also provide the FireCloud client library which can be used to access FireCloud objects such as workspaces and the workspace data model. We’d like to improve our client library offering by providing an R version and making the python library more user-friendly.
Collaboration (Q2 2019)
We now have notebook persistence in the workspace, but we don’t have more sophisticated collaboration tools such as collaborative editing (think Google Docs) or version control. There is some exciting development from the Jupyter team on this front which we’d like to try and make use of in the future.
We hope that the above changes will be beneficial to your work in FireCloud. If you have any further questions or comments, our team closely monitors notebook-related posts on the FireCloud Forum. Happy notebooking!