We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
(howto) Configure a method in a workspace
This tutorial covers the basic formatting for configuring a method's inputs and outputs in a workspace. There is also a video tutorial, that explains this and other concepts in more detail if you prefer.
As a reminder, the inputs and outputs are defined in the WDL. FireCloud interprets the WDL and provides you an input "form" to fill out. The outputs part of the form is optional.
This example shows just the workflow portion of a WDL and what the inputs and outputs look like filled out in FireCloud.
Look at the WDL code. How many workflows and tasks do you see?
Answer: The WDL includes one workflow with two tasks. There are inputs listed for the workflow and separately for the tasks, followed by outputs for the workflow.
In the Inputs diagram, you can tell the difference between workflow inputs and task inputs by looking at the name. The first four inputs named CramtoBamFlow, are the workflow inputs according to the WDL.
After the name of the task or workflow, the other columns are:
- Variable = the name of the input given in the WDL
- Type = Integer, String, Boolean, File, or Array of these
- Attribute = the actual value (this is the part you configure)
Running a method without the workspace data model
Sometimes users come to FireCloud with a preexisting WDL that was successfully running on their local infrastructure and would now like to test it quickly in a cloud environment. These users can take advantage of the ability to run their analysis without setting up the data model using the following steps:
1. Upload the method to the Method Repository and export it to a workspace.
2. Uncheck the checkbox “Configure inputs/outputs using the Workspace Data Model” within the method configuration.
3. Upload the input json file by clicking “Populate with a .json file.” FireCloud then populates the configuration with the attributes listed in the json file. Review and edit if necessary.
4. Click "Launch Analysis". If you want to use call caching, leave this checkbox checked.
The output files will be placed in the Google bucket after completion and will not be registered in the data model.
Attribute format per type
When you fill out the Attribute section per input, you have to follow the formatting requirements based on the type listed. See the Inputs diagram above for examples. Workspace attributes have slightly different formatting.
1. Integer - No formatting required.
2. String - Quotes required.
3. Boolean- Quotes required. Case insensitive so
"TrUE" are the same.
4. File - can be referenced from the Google bucket, data model, or workspace attribute section. See the Referencing files section below for details.
5. Array[X] - Lists of these attributes can be entered with a comma between each item.
Google bucket - Use "gs://url-to-file-in-bucket" to reference the Google bucket file. Please note the quotes are necessary if you are directly referencing a file URL, but the quotes are not necessary if you reference a file using the data model or workspace attributes below.
Data model - Suitable for referencing several files listed in your data model without hard coding values or having to adjust your method configuration when you add more data to the table. You can call the files listed under the name of a column by typing
this.plus the column title. Make sure to leave the checkbox checked “Configure inputs/outputs using the Workspace Data Model” so that FireCloud registers that you are using the data model.
this.tells FireCloud to look at your data model in the table you set as your root entity. So if you set your root entity to "sample" when you imported the method to your workspace, then FireCloud will look in the "sample" table for an attribute (a column) with the name you specify. e.g.
this.sample_idwould look in the "sample" table for the "sample_id" attribute.
This expression also gives you the flexibility to dive into attributes that exist on any entity that the method config is running on. For example, say your method is to be run on a pair. The pair table contains a control_sample_id, a case_sample_id, and their corresponding bam files. Say your WDL task requires the case_sample_bam input. You’d type
Workspace attribute - Storing an input as a workspace attribute is convenient if you are using a file over and over again in multiple methods. If the file path changes, you only have one place to update, similar to global variables in scripting. You can call this by typing
workspace.plus the attribute key. For example,
workspace.ref_dictIf you type
workspace.into the method configuration, all the workspace attributes available will auto-populate below. See how to format workspace attribute values here.
If you are using the data model for your analysis, you can optionally fill out the outputs with the same nomenclature (workspace., this., etc.). It is optional because your outputs will go directly into your bucket without defining anything here. If you want links to the output file destination in your data model, you need to define it here. Determine the name of the column or use a pre-existing column and type
this. in front of it.
this.analysis_ready_bam will output the BAM to the column called analysis_ready_bam in the sample tab (if you chose to run this method on a sample). If the column header doesn’t exist now, the method will create it after execution.
A note on versioning
When you create a method or a method configuration, FireCloud will give it a number (starting at 1) to identify the version. This is called a snapshot ID and can be found in the method and method config header along with the name, owner, documentation, etc. Every time a method or method config is edited, FireCloud automatically adds 1 to the ID. You can keep track of what method snapshots you have launched in the Monitor tab.