We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
How am I supposed to specify directories with WDL?
I have a big directory (1 TB) of input files that I'd like to process using Cromwell. If I specify the directory as a
File, then it copies the entire huge directory because it can't hard-link to the directory. If I specify the directory as a
String, then a WDL task can't find the directory, because the task runs in a subdir and the path that I specified for the directory no longer makes sense in that subdir.
If I glob all the files in the directory, and pass the resulting
Array[File] around instead, then things break because the shell command lines become too long.
Okay, I could specify a full path to the directory as a string, but I don't want to do that, since I run the same WDL job on different computers and the paths to my working directory is different on the different computers. and I just rsync the working directory. It's much better if I can specify everything with relative paths.
I suppose I could make a shell alias that sets an environment variable to
$PWD and then I could have a WDL task that constructs a full path to the directory using the environment variable and the relative path to the directory. This seems like a huge kludge, however.
Is there a way that this sort of thing is supposed to be done?