Task that takes one entity as input and outputs multiple entities (e.g. demultiplexing)

aryee

We (and many other people) batch together multiple samples for sequencing. The resulting output file from the sequencer thus contains reads from multiple samples that then need to be split into separate (per-sample) files. We would like to implement such a demultiplexing task in WDL. It would take one file as input (e.g. a FASTQ representing all the reads from the sequencing run), and output multiple output files (e.g. one FASTQ per sample). I know how to glob the output to create an array of output files, but these then become attached to a single entity in the data model (i.e. the sequencing run). We would instead like each of these output samples to become individual sample entities in the output model. Does anyone have advice on how to tackle this?


  AdelaideR Member admin
    edited February 2019

    Hi @aryee lab. I think this can be accomplished by adding a step in the WDL to demultiplex into separate fastq's based on an additional column in the sequencing file. I would need to have some insight into how the original Fastq is tagged in order to separate the files, do you have a barcode in the header, for example? Do the Fastq files need to remain combined until the end, or can each sample be run separately through the workflow after splitting into a per-sample Fastq?

    Is this in the WDL in the workspace that you already shared with me? I can clone it and try a trial and let you know.

    If the WDL is not in the workspace, you can send it as a direct message so I can make a suggestion.

