Secondary index files and directories in WDL
This is a followup from a recent discussion about getting compatible bcbio generated WDL (http://gatkforums.broadinstitute.org/wdl/discussion/9257/object-attribute-access-and-secondary-index-files). Thanks to all the great help you've provided we now have compatible WDL output that passes validation:
This is brilliant, and I'd like to move into testing runs with Cromwell. Before starting this, there is one major area I know we're missing in the conversion, handling of secondary files and directories of files. CWL has the notion of secondaryFiles (http://www.commonwl.org/v1.0/Workflow.html#File) which you can use to block these and ensure they get staged/run next to each other. I use this in bcbio and wanted to figure out the best way to map it into WDL.
There are two cases we use these for:
- Index files associated with compressed inputs, like BAM bai indices and bgzip VCF tbi indices. These are a single index file attached to the original file that should get staged in the same directory when running.
- Directories of index files like bwa or snpeff. These are a bit trickier since they can have many files and a variable number depending on the input.
What is the recommended way to deal with these cases in WDL? I'll have to re-engineer bcbio to be able to represent and pass these and wanted to do so in a way that was forward compatible with WDL's thoughts and plans. I've seen recommendations on current hacks like explicitly declaring the indexes as separate files, or tarring up a directory of files and passing that as input. I'm not clear enough on staging files from WDL/Cromwell to understand if these are guaranteed to always go in the right place (bai next to bam, all indexes in the same directory).
Thanks for any thoughts/suggestions/tips.