Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.

Quick Start Part 5: Workspace > Notebooks

KateNKateN Cambridge, MAMember, Broadie, Moderator admin
edited September 2018 in Quick Start Guide

This feature is currently released in a Beta status to give you a chance to try it while we work on developing it further. Tell us what you like or struggle with on the forum.

From the notebooks tab of your workspace you can launch an interactive analysis environment based on Jupyter (formerly IPython) notebooks, Spark, and Hail. Jupyter notebooks are becoming an increasingly popular way of creating reproducible bioinformatics analysis tasks. They combine familiar and powerful programming languages, like R and Python, with the ability to create and share documents containing code, results, and narrative text.

Jupyter integrates well with Spark to provide large-scale data processing and analysis, and provides an excellent environment in which to run leading genomic analysis software such as Hail.

When you create or upload a notebook in the Notebooks tab, a new notebook file is created in your workspace bucket. You then have the option to spin up a Dataproc (managed Spark) cluster in your Google Cloud Platform (GCP) project. This gives you secure access to a Jupyter Notebook server on which you can run your notebook. From within a notebook or Spark job, you can access any FireCloud-managed GCP resource (like buckets) you have access to, without need for authentication. You can also use the Python FireCloud client (FISS) to access FireCloud objects such as workspaces or the entity data model. Changes made to your notebook are automatically persisted back to your FireCloud workspace.

Additional resources:

Post edited by Tiffany_at_Broad on
Sign In or Register to comment.