For WDL questions, see the WDL specification and WDL docs.
For Cromwell questions, see the Cromwell docs and please post any issues on Github.
How do I loop over a set of input files?
Let's say I have two input files that I want to loop over serially, i.e not in parallel with a scatter/gather loop, how do I do that? I know it's possible to use while loops, although I haven't seen a code example so I don't know the syntax, and perhaps it's not fully supported yet? But it would be possible to hack together a loop I suppose, e.g if you index your files and for each iteration you increase the index by one until all files have been looped over. But perhaps it's possible to hack the scatter/gather function as well?
The reason I want to loop rather than scatter/gather is to optimize my pipeline, at the moment I'm wasting resources since I'm forced to use a suboptimal number of scatters due to hardware limitations. The task I want to serialize can utilize all threads, but I can't use scatter/gather due to RAM constraints, and other tasks downstream use more CPU and less RAM. But since I'm forced to restrict the number of shards globally, every tool suffers.
If I could adjust which tasks in a scatter/gather block are serialized, and how many shards each scatter task is allowed to create, I could better utilize my resources. The current solution of using a one size (does not) fits all solution isn't quite doing it properly in my opinion.
Whether you implement such functionality or not is secondary right now though, I'm more interested in asking you if it's possible to loop over a set of input files, is it possible?
Edit: One idea would be to start a subworkflow and to restrict the number of max workflows to two, thus effectively looping over the files since only one subworkflow can start since there's already one workflow running. But I'd like to keep it as clean as possible and not use more scripts than necessary if it's possible, so I'd prefer to hack scatter/gather or use a while loop instead.