Google bucket path String no longer properly coerced to File in output

dheimandheiman ✭✭Member, Broadie ✭✭

Because optional outputs have yet to be added to the WDL spec (an outstanding issue for over two years), we've had to implement a work-around using a generic "null" file:

task variable_outputs {
  File? opt_input
  String null_file = "gs://broad-institute-gdac/GDAC_FC_NULL"
  command { ... }
  output { File might_exist = "${if defined(opt_input) then 'opt_output.txt' else null_file}" }
}
workflow var_output_wf {
  call variable_outputs
}

At some point between our implementation and now, this stopped working and now returns an error:
Failed to delocalize files: failed to copy the following files: "/mnt/local-disk/broad-institute-gdac/GDAC_FC_NULL -> ...
Instead of simply coercing the bucket path String into a File, it seems to now make the assumption the file was localized, and looks for it on the local disk rather than doing a pass-through.

This is an extreme change in behavior, as in the past a File output set to a File input would return the original input, rather than the localized copy of the file, which we discovered when we would zip the results of the first task in a workflow, add to that zip file in the second workflow, and continue passing task-to-task, only to see at the end that only the files in the first task were in the archive. This change seems to imply that if I tried that now, the final output would have all the results in it. I haven't seen any release notes stating this change.

While the above code would probably work if I changed the input type for null_file from String to File, doing so would break our local testing process that involves running the WDLs via local cromwell, which doesn't recognize google bucket paths, and thus would fail when it tries to localize the file. Yes this limits our testing to only be with the optional inputs to prevent a delocalization failure, but we decided that was acceptable.

Any insight into what is happening, or if this can be fixed would be very welcome. Thanks!

Answers

  • KateNKateN admin Cambridge, MAMember, Broadie, Moderator admin

    I do recall something recently happening with String vs. File coercion, so I'm going to do some digging to see if I can't figure out what was changed. Thank you for bringing this to our attention.

  • ChrisLChrisL admin Cambridge, MAMember, Broadie, Moderator, Dev admin
    edited May 2018

    @dheiman could you let us know which was the version of Cromwell which this worked in, and which you're using today? To be honest I'm not sure this should ever have worked, so this might have passed by as a "bug fix".

    This isn't really about File inputs being delocalized again (that still doesn't happen - you just get the original back). It's about String inputs and how to interpret those when they're used as File outputs.

    For a more concrete example of why this works the way it does, consider this WDL which is the same shape as yours:

    task foo {
      String output_name
      command {
        echo "some content" > ${output_name}
      }
      output {
        File output = output_name
      }
    }
    

    You're right that optional outputs would make this whole problem so much easier. It's definitely on the radar. To make this workaround work in the meantime, I think you'll want to do this:

    • Make it a File input, but remove the default assignment (hard coding a GCS path into your WDL will preclude it from working on the local backend):
      File null_file
    
    • Pass that null file input through via your workflow inputs, and add it into your inputs jsn
    Post edited by ChrisL on
  • dheimandheiman ✭✭ Member, Broadie ✭✭
    edited May 2018

    Hi @ChrisL, unfortunately FireCloud makes it difficult to determine what version of a method was run when, and what version of cromwell was used to run it. I've very likely deleted that workspace at this point anyways. I'm fairly certain FireCloud was on either 28 or 29 when this worked, but not absolutely positive.

    The thing about the String coercion to File is that in the error message, it's clear that the gs:// has been stripped out during coersion, and changed to /mnt/local-disk/, so it's being detected, but the wrong thing is being done when it's found. It makes sense to prepend the working directory to aString being coerced to File that doesn't include an (https?|ftp|gs):// prefix, but if such a prefix is already being detected, isn't the right course of action to not replace it, rather than strip it out and replace it with the working directory?

    Unfortunately, as you've mentioned, I think my only option is to edit all of our WDLs to expose the parameter, and fix them again if/when optional outputs are realized.

  • ChrisLChrisL admin Cambridge, MAMember, Broadie, Moderator, Dev admin

    @dheiman I can add a little more technical context here. We currently have to tell Google ahead of time which files to delocalize once it completes its execution of the task's command line. When you have a pattern in your outputs like File x = "a.txt", Cromwell treats that as "convert 'a.txt' to a local path and tell Google to delocalize as part of the job request".

    In the case of "a.txt" that means prepending the execution directory to the start of the String. In the case of a more complex string, I think we lean on java's directory resolution libraries to resolve the string relative to the execution directory. In your case, that'll be the /mnt/local-disk path you're seeing. So ultimately this is an issue with resolving something like "gs://xyz" as a local path relative to the execution directory.

    With a view to optional outputs, I'll reiterate that we really do hope to address it soon! Once my team completes its work on making sure WDL 1.0 is working in Cromwell, we'll be able to start looking at things like the Directory type and optional File outputs and hopefully the cadence of WDL language updates will pick up from there.

Sign In or Register to comment.