Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.

Glossary

jneffjneff BostonMember, Broadie admin
edited June 2016 in Archive

Glossary

Basic Concepts

Analysis Submission
A user submits an Analysis Submission to the workspace service when launching a method configuration against an entity. An analysis submission is a combination of a method config and a targeted entity; this combination identifies the method that will run, the number of times it will be run (if the targeted entity is a set of the method configuration’s root entity type, the method will be run on each entity in the set), and the inputs and outputs for each run. The method configuration either maps each input to an attribute on a data model entity or specifies a literal value for that Input parameter. The method configuration maps method outputs to attributes on data model entities. In response to the user submission, the workspace service will launch a workflow (see below) for each run of the method.

BAM file
An input unit consisting of tab-delimited text that contains sequence alignment data. It is the binary version of a SAM file.

Controlled Access Data
De-identified data that may be unique to individuals. FireCloud users with dbGaP-authorization for TCGA data and a linked eRA Commons account can access TCGA controlled access data.

Data Model
Organizes data and meta-data for workspaces and analysis runs. The data model includes predefined entity types (e.g., participants, samples, participant sets,sample sets), relationships between these entity types, and entity attributes. For your convenience, results from analysis runs are populated directly to the data model. Currently, the data model is tailored to TCGA data, but will be extensible to non-TCGA projects with a germline or cell-line focus.

Entity
Refers to physical items (e.g., participants) or collections of physical items (e.g., participant sets). Entities provide organization and hierarchical structure for data. For example, a participant entity refers to a participant. A sample entity refers to a sample that may belong to that participant.

Entity Attributes
FireCloud uses entity attributes to describe data entities (e.g., a participant identifier) and reference entity file locations (e.g., the URL to a Google Cloud Storage bucket). Entity attributes can be fed into and populated from a workflow analysis. The collection of attributes on FireCloud entities is dynamic: users may attach whatever attributes they want to the entities within workspaces they have WRITER-level access to, both through uploading TSV files and method configurations that map method outputs to entity attributes.

FireCloud RESTful API
All functionality presented through the user interface is also available to users through a public-facing secure RESTful API. Comprehensive on-line documentation for this API is available at https://api.firecloud.org. This online documentation employs the Swagger representation of RESTful APIs. The FireCloud RESTful API’s endpoints are organized into the following categories:

  • Entities

  • Method Configurations

  • Method Repository

  • NIH

  • OAuth

  • Profile

  • Storage

  • Submissions

  • Workspaces

Load Files (TSV Files)
FireCloud uses tab-separated-value (TSV) files to import entities and entity attributes into the Data tab. Each line in the TSV file corresponds to an entity and must reference entities of the same type. The FireCloud Data Model supports the following entity types:

  • Participant

  • Sample

  • Pair

  • Participant Set

  • Sample Set

  • Pair Set

Methods
A WDL description of a task or workflow in FireCloud.

Method Configurations (Method Configs)
Bind data to Methods and specify which attributes to use as inputs and outputs to an analysis runs. You can specify attributes in Method Config output fields that will get updated with results from an analysis run.

Method Repository
Contains methods for analyzing data (workflows and their constituent tasks), and method configs. Tool developers can upload their own methods using the FireCloud Command Line Interface (CLI).

Open Access Data
Public de-identified data that is not unique to individuals. All FireCloud users can access open access TCGA data.

Task
In FireCloud methods and WDL, tasks refer to executable programs that are bundled into a Docker image.

Workflow
Workflows are comprised of one or more tasks and contain the method and the method input parameters. FireCloud submits a workflow’s tasks to the Google Job Execution System (JES) when you run an analysis.

Workspace
Computational sandbox in which a FireCloud user organizes genomic data and metadata into a data model.

Workspace Access Controls (ACLs)
Define permissions and enable the secure sharing of workspaces among FireCloud users. ACLs contain three access levels: READER, WRITER, and OWNER where each access level represents an expanded set of permissions from the previous.

Workspace Attributes
Globally accessible input values within a workspace. If you enter workspace attributes in the workspace Summary tab, they can serve as inputs for any Method Config within your workspace.
Google Cloud Platform Concepts

Google Cloud Platform Concepts

Google Billing Account
In order for a FireCloud Administrator to create a new FireCloud Billing Project, you must first create a Google Billing Account. Google Billing Accounts are billed for cloud storage and compute costs that are tracked through FireCloud Billing Projects. You will need to provide a bank account or credit card to set up a Google Billing Account, or use a Google Reseller for alternative payment options.

Google Cloud Storage Bucket
Google Cloud Storage stores objects that are organized into buckets. Google buckets are flat containers where each object stored in the bucket is identified by a user-assigned key; thus, data objects in Google Cloud Storage are uniquely identified by their bucket name and object key. All requests for reading/writing bucket data objects are authorized using an access control list (ACL).

Google Developers Console
The Google Developers Console is the user interface for Google Cloud Platform. You can view buckets and bucket data and Google Project information through the Google Developers Console.

FireCloud Billing Project
Every workspace is linked to a single FireCloud Billing Project that tracks all cloud storage and cloud compute costs incurred within that workspace. Only FireCloud administrators can create and grant you access to FireCloud Billing Projects for use in FireCloud.

gsutil
Google Cloud Storage’s command line utility. Use this to upload data and files to Google buckets.

Tool Developer Concepts

Cromwell
Cromwell is the workflow execution service used to run and test WDL workflows. When creating WDL workflows, you can test on a local installation of the Cromwell execution engine prior to uploading and testing on FireCloud. Cromwell reads WDL, which describes workflows of executable tasks packaged into docker containers. Cromwell then calls Google’s Job Execution System to run the executable tasks packaged within docker containers.

Docker
FireCloud uses Docker to distribute tools and applications for use in its methods. Docker allows applications and their dependencies to be packaged into discrete runtime environments, called Docker containers.

Docker Container
Docker containers wrap software in a file system that can contain the dependencies to run your tools on FireCloud. These dependencies can include code, system tools, system libraries and anything you can install on a server, thus enabling portability of tools across operating systems.

Docker Host
Virtual machine on which containers are launched, managed with ‘docker-machine.’

DockerHub
DockerHub is a cloud-based registry service for Docker images. You can store and share your Docker images through repositories (repos), both public and private for use on FireCloud.

Docker Image
A Docker image is the software that gets loaded into a docker container.

FireCloud Command Line Interface (CLI)
The FireCloud CLI enables tool developers to push methods to FireCloud.

FISSfc
Contains command line interface and python client bindings to the FireCloud RESTful API and allows users to script FireCloud tasks through the command line, bypassing the FireCloud user interface.

WDL (Workflow Description Language)
Workflow Description Language (WDL) is a language specifically designed for expressing genomics workflows. WDL workflows are represented in a way that can be read by humans and understood by Cromwell, the Workflow Execution Service that will run the specified tools to analyze data.

Tagged:
Sign In or Register to comment.