Service notice: The "QueuedInCromwell" issue is resolved. All workflows should now be running normally. If you experience further issues of this type, please report them here.
LATEST RELEASE: May 17, 2018
Release Notes can be found here.

Is there a "null" value that can be used in WDL? Could "else" statements be added to conditionals?

dheimandheiman Member, Broadie
edited July 2017 in Ask the FireCloud Team

Say I have a task that has different behavior depending on the setting of optional parameters. Is there a null or undefined value in WDL that will cause the defined function to evaluate to false that I can use to effectively hide some of these parameters in the workflow section rather than basically have nearly the exact same task defined twice in my WDL?

In the below example I have a preprocess step that only runs if unprocessed_file is defined and processed_file is undefined. Many correlations use categorical data, and may have a preprocess step to convert continuous data into categorical data via binning.

Additionally, there is result_archive, a zip file that is passed from task to task, with all outputs added to it, until it is finally attached as an attribute to the entity the method configuration was run on.

Because opt_preprocess is optional, the initialization of result_archive can occur either there, or in process, thus requiring a more flexible implementation of process. I do this by having two optional inputs (a File and a String), only one of which should ever be defined.

It would be really useful if there is a way to force the input result_archive to be undefined in the call process as process_alt block below, and the same for result_archive_name in the call process block. Then there would not be as much potential for accidentally breaking things in the Method Config, as my_workflow.process.result_archive_name and my_workflow.process_alt.result_archive would not be accessible to it.

I'm using conditionals in my workflow definition, and as you can see, the code would be a lot cleaner if else statements were available rather than having to test for the opposite state.

task opt_preprocess {
    File unprocessed_file
    String result_archive_name
    String result_archive="${result_archive_name}.zip"

    command {
        preprocess ${unprocessed_file} > preprocessed.dat
        zip -r ${result_archive_name} . -x \
            "fc-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]/*" \
            lost+found/\* \
            "tmp.[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/*" \
            exec.sh
    }

    output {
        File preprocessed_file="preprocessed.dat"
        File result_archive_pkg=result_archive
    }

    ...
}

task process {
    File input_file
    File? result_archive
    String? result_archive_name

    command {
        process  ${input_file} > processed.dat
        result_archive_name=${if defined(result_archive) then basename(result_archive) else result_archive_name}
        ${defined(result_archive)} && mv ${result_archive} .
        zip -r $result_archive_name . -x \
            "fc-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]/*" \
            lost+found/\* \
            "tmp.[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/*" \
            exec.sh
    }

    output {
        File processed_file="processed.dat"
        File result_archive_pkg="${if defined(result_archive) then basename(result_archive) else result_archive_name}"
    }

    ...
}

task postprocess {
    File in_file
    File result_archive
    String result_archive_name=basename(result_archive)

    command {
        postprocess ${in_file}
        mv ${result_archive} .
        zip -r $result_archive_name . -x \
            "fc-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]/*" \
            lost+found/\* \
            "tmp.[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/*" \
            exec.sh
    }

    output {
        File result_archive_pkg="${result_archive_name}"
    }

    ...
}

workflow my_workflow {
    File? preprocessed_file
    File? unprocessed_file
    String result_archive_name="my_workflow"

    if (defined(preprocessed_file)) {
        call process as process_alt {
            input: input_file=preprocessed_file,
                   result_archive_name=result_archive_name
        }
    }

    if (!defined(preprocessed_file) && defined(unprocessed_file)) {
        call opt_preprocess {
            input: unprocessed_file=unprocessed_file
        }
        call process {
            input: input_file=opt_preprocess.preprocessed_file,
                   result_archive=opt_preprocess.result_archive_pkg
        }
    }

    post_file="${if defined(process.out_file) then process.out_file else process_alt.out_file}"
    result_archive="${if defined(process.result_archive_pkg) then process.result_archive_pkg else process_alt.result_archive_pkg}"

    call postprocess {
        input: in_file=post_file,
               result_archive=result_archive
    }

    output {
        postprocess.result_archive_pkg
    }
}
Tagged:

Best Answer

Answers

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    When you simply don't define your optional variables (i.e. leave that field blank in your config), defined() should evaluate to false.

    As for the else statement in Cromwell, I will put in a request on your behalf to the development team. I agree that it would be very helpful to have.

  • dheimandheiman Member, Broadie

    @KateN said:
    When you simply don't define your optional variables (i.e. leave that field blank in your config), defined() should evaluate to false.

    I am aware of that, what I'm asking is a little more complex, I don't want these optional variables to show up in the config, as only one should ever be defined. I'm looking for a way to initialize a variable to undefined via logic such that the variable doesn't show up at all in the method config.

    In the above example, these variables become exposed because the base task process has both optional variables, but each conditional block in the workflow definition only defines one or the other:
    process_alt.result_archive and process. result_archive_name.

    Basically, these values should be an exclusive or, and I was hoping that there would be a way to initialize the opposing variable in each case to something that defined() evaluates to false so as to exclude them from the method config UI.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    I see what you're trying to do now. No, there isn't anything currently available like that to exclude a variable from the UI. I will speak with both the FireCloud and Cromwell teams to see if that's possible to implement with our current structure.

    In the meantime, I may have a workaround that would leave you with only one File variable to define. I'm not familiar with your specific use case, but would it be possible to use File variable and one Boolean variable, like so:

    workflow myWF{
        File myFile
        Boolean processed
        if(processed) {
            call process as process_alt
        }
        if(!processed) {
            call opt_preprocess
            call process
        }
    }
    
  • dheimandheiman Member, Broadie

    Unfortunately no (though this is very similar to a pattern I use elsewhere to handle WDL's lack of optional outputs).

    The issue is that when the task initializes an archive it needs a string of the name of the file it will be creating (result_archive_name), but when it adds to an archive, it needs the actual archive path (result_archive).

    Unfortunately I'm outside of the editing window, but I realize I made a mistake on the second conditional in the workflow, it should be:

        if (!defined(preprocessed_file) && defined(unprocessed_file)) {
            call opt_preprocess {
                input: unprocessed_file=unprocessed_file,
                       result_archive_name=result_archive_name
            }
            call process {
                input: input_file=opt_preprocess.preprocessed_file,
                       result_archive=opt_preprocess.result_archive_pkg
            }
        }
    

    Technically, the ugly way around this is to define result_archive_name within the process task, and only use it if the File? input is undefined. I believe defining at the workflow level any variable that may be used by more than one task is much cleaner and easier to edit, so would prefer to be able to do that.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    I think the workflow could still work with the archive path vs name issue. Does this look like it would work for you?

    workflow myWF {
        File myFile
        Boolean preprocessed
        String result_archive_name
    
        if(preprocessed) {
            call process as process_alt {
                input: input_file = myFile,
                        result_archive_name = result_archive_name
            }
        }
        if(!preprocessed) {
            call opt_preprocess {
                input: input_file = myFile,
                        result_archive_name = result_archive_name
            }
            call process {
                input: input_file = opt_preprocess.preprocessed_file,
                        result_archive = opt_preprocess.result_archive_pkg
            }
        }
    }
    
  • dheimandheiman Member, Broadie
    edited July 2017

    Unfortunately, it wouldn't work, as process_alt.result_archive would show up as a modifiable variable in the config, as would process.result_archive_name.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    So what you'd prefer is the variables to simply not show up in the config if they aren't used for a particular entity?

  • dheimandheiman Member, Broadie

    I want the capability to do that, because setting them in the config would break the logic of the task.

  • RuchiRuchi Member, Broadie, Moderator, Dev

    Hey @dheiman,

    It seems to me that you'd always want to run process and post_process tasks, and the only optional task is opt_preprocess depending on whether the preprocessed_file is not defined and unpreprocessed_file is defined.

    This is given that, this is how I'd structure the workflow:

    workflow my_workflow {
        File? preprocessed_file
        File? unprocessed_file
        String result_archive_name="my_workflow"
    
    
        if (defined(unprocessed_file) && !defined(preprocessed_file)) {
            call opt_preprocess {
                input: unprocessed_file=unprocessed_file,
                       result_archive_name=result_archive_name
            }
        }
    
        File preprocessed_file_to_use = select_first([preprocessed_file, opt_preprocess.preprocessed_file])
    
        call process {
            input: input_file=preprocessed_file_to_use,
                   result_archive_name=result_archive_name
        }
    
        call postprocess {
            input: in_file=process.out_file,
                   result_archive=process.result_archive_pkg
        }
    
        output {
            postprocess.result_archive_pkg
        }
    }
    

    The advantage of structuring the workflow this way is to that the process task will check if the preprocessed_file already exists, or waits for opt_preprocess to run. This way you can call process and post_process only once in the workflow and not have to alias and keep track of 2 parallel sets of outputs.

    With the 2 optional inputs, we are left with 4 possible combinations:
    1. if ( !defined(preprocessed_file) || !defined(unprocessed_file) ) -> no tasks run
    2. if ( defined(preprocessed_file) && defined(unprocessed_file) ) -> only process and post_process tasks runs
    3. if ( defined(preprocessed_file) && !defined(unprocessed_file) ) -> only process and post_process tasks runs
    4. if ( !defined(preprocessed_file) && defined(unprocessed_file) ) -> opt_preprocess, process and post_process tasks run

    Hope this helps!

  • dheimandheiman Member, Broadie

    Hi @Ruchi,

    I like the use of select_first() to pick the first defined value in an array - I'll likely use that for simpler WDLs, thanks for the idea!

    Unfortunately, that still leaves the issue of the archive that gets generated of all the output files - to be initialized, it needs a string of what its name will be, otherwise it needs the archive generated by the preceding step, so which variables process uses varies by more than just the file being processed. The above workflow would cause all data generated by opt_preprocess to be missing from postprocess.result_archive_pkg, and process.result_archive would still be exposed in the method config, which I'm trying to avoid.

  • RuchiRuchi Member, Broadie, Moderator, Dev
    edited August 2017
    Hey @dheiman, thanks for clarifying the usage of the result_archive_pkg. So just to be sure I have this right, your inputs can go two ways:
    1. A user can supply a preprocessed file & result_archive.zip
    2. A user can supply an unprocessed file and the `opt_preprocess` task creates the preprocessed file & result_archive.zip from that unprocessed file.

    Ideally, the process step would use the outputs of opt_preprocess or the preprocessed file/result archive file passed in by a user. In that case, is it possible to add another select_first for the result_archive files? That way none of the task-level inputs to `opt_preprocess` or `process` are optional, what's optional are the workflow-level inputs, and depending on which ones are defined, the correct tasks are called upon and the appropriate inputs are passed into those tasks? 

    My second attempt to make some small re-writes looks like this:

    task opt_preprocess {    File unprocessed_file    String result_archive_name
        command {        preprocess ${unprocessed_file} > preprocessed.dat        zip -r ${result_archive_name} . -x \            "fc-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]/*" \            lost+found/\* \            "tmp.[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/*" \            exec.sh    }
        output {        File preprocessed_file="preprocessed.dat"        File result_archive_pkg="${result_archive_name}.zip"    }
    }
    task process {    File input_file    File result_archive    String result_archive_name = basename(result_archive)        command {        process  ${input_file} > processed.dat        mv ${result_archive} .        zip -r ${result_archive_name} . -x \            "fc-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]/*" \            lost+found/\* \            "tmp.[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/*" \            exec.sh    }
        output {        File processed_file="processed.dat"        File result_archive_pkg=result_archive_name    }}
    task postprocess {    File in_file    File result_archive    String result_archive_name=basename(result_archive)
        command {        postprocess ${in_file}        mv ${result_archive} .        zip -r ${result_archive_name} . -x \            "fc-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9]-[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]/*" \            lost+found/\* \            "tmp.[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/*" \            exec.sh    }
        output {        File result_archive_pkg="${result_archive_name}"    }}
    workflow my_workflow {    #Optional files required for the 'process' task    File? preprocessed_file    File? result_archive_zip        #Optional files required for the 'opt_preprocess' task    File? unprocessed_file       if (defined(unprocessed_file) && !defined(preprocessed_file)) {        call opt_preprocess {            input: unprocessed_file=unprocessed_file,                   result_archive_name=result_archive_name        }    }
        File preprocessed_file_to_use = select_first([preprocessed_file, opt_preprocess.preprocessed_file])    File result_archive_to_use = select_first([opt_preprocess.result_archive_pkg, result_archive_zip])
        call process {        input: input_file=preprocessed_file_to_use,               result_archive=result_archive_to_use    }
        call postprocess {        input: in_file=process.processed_file,               result_archive=process.result_archive_pkg    }
        output {        postprocess.result_archive_pkg    }

    If you'd like to see what my method config/method looks like in FireCloud, I have it set up here: 
    https://portal.firecloud.org/#workspaces/broad-firecloud-dsde/RM_Playground/method-configs/Optional-inputs/optional_inputs
    Post edited by KateVoss on
  • dheimandheiman Member, Broadie

    The user never supplies a results_archive file or name, those should only be defined in the workflow. The issue is which task the archive is created in. The preprocessor step, if required, always initializes an archive, so it's inputs are fixed (unprocessed data + name of archive predefined in the workflow). The processing step will either add to the archive created by the preprocessor, or create the archive if the preprocessor was not run, thus its inputs are either the preprocessed file and archive generated by the preprocessor, or a user-supplied preprocessed file + name of archive predefined in the workflow.

Sign In or Register to comment.