(howto) Run workflows on sets

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited September 2017 in Tutorials

To run a workflow on a set of data entities, you just need to do to things: define your set, and use an expression in the Launch dialog to tell FireCloud how to handle the contents of the set. Note that the example below uses a sample set because that's a very common use case, but it can be trivially adapted to any other kind of set.


Define your sample set

You define your sample set in the Data tab by importing what is essentially a table of samples and the sample set they belong to. For an example of what that looks like, see this public workspace's Data tab: https://portal.firecloud.org/#workspaces/help-firecloud/FireCloud101-Basics/data

image

If you click on "Download 'sample_set' metadata", you'll get a zip archive containing two files: sample_set_entity.tsv and sample_set_membership.tsv. Disregard the former; you'll see the latter describes the set of samples by listing, on each line, a sample set ID and a sample that belongs to it. It looks like this:

membership:sample_set_id sample
CEUTrio_wgs_20 NA12878_wgs_20
CEUTrio_wgs_20 NA12877_wgs_20
CEUTrio_wgs_20 NA12882_wgs_20

All you need to do to define your sample set is modify this file (or generate one like it) with your sample set and sample IDs. The sample set ID can be any arbitrary name; the sample IDs must be IDs of samples you have already imported into the workspace. You can define multiple sample sets within the same file, and a sample can belong to multiple sample sets, so you can do this for example:

membership:sample_set_id sample
CEUTrio_wgs_20 NA12878_wgs_20
CEUTrio_wgs_20 NA12877_wgs_20
CEUTrio_wgs_20 NA12882_wgs_20
CEUTrio_test NA12877_wgs_20
CEUTrio_test NA12882_wgs_20

Once you've made your TSV file describing your sample set(s), you import it by clicking the "Import Metadata..." button (still in your workspace's Data tab). This opens a dialog; follow the instructions to select the TSV file you created or modified, and assuming you don't hit any errors, once you close the dialog you'll see there is now a "sample_set" tab next to "participant" and "sample". If you click on it you can verify that your sample set has been created correctly.


Run a method on your shiny new sample set

To run a method on your newly created sample set, you don't need to change your method configuration. After clicking "Launch Analysis...", the dialog opens on the sample list; just switch to the sample_set list that should now appear, and select your sample set.

At this point, the trick is that you can't just hit "Launch" right away; first you need to use an expression to tell FireCloud how to deal with the fact that, instead of the single sample it's expecting based on the method config, you're giving it a list of samples. In this case, the expression is this.samples. Then you can hit "Launch".

Note the plural in the expression; if you leave out the s it won't work. Yes, it's annoying, and no it's not well documented yet... this is something we're working on improving.

Tagged:
Sign In or Register to comment.