Optional task inputs for imported workflows

I recently moved some shared code into a workflow that I can import, and I noticed that there is no possibility to change some of the default arguments for tasks within the import workflow.
When I import a task, wdltool inputs puts in (optional) config keys for all the possible inputs for the imported task. But for workflows, this is not the case.

I was wondering if this is a bug or by design? I could put the optional arguments into the workflow itself as a workaround, but that would lead to a lot of duplicated argument definitions, and it would be easy to miss some. Especially if you have imported workflows that import other workflows themselves.

I have tried to manually add the arguments I wanted to the arguments.json file, but they seem to get ignored. (No errors though)

Issue · Github
by Geraldine_VdAuwera

Issue Number
2083
State
open
Last Updated
Assignee
Array

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Hi Redmar, I think this is probably a bug. Can you give a small code example so we can make sure we understand your use case and observations?
  • For the file main.wdl

    import "share.wdl" as share
    
    workflow wf {
        String user
    
        call share.hello as hello {
            input:  user = user
        }
        call share.howdie as howdie {
            input: user  = user
        }
    }
    

    For the imported tasks and workflows share.wdl

    task hello {
        String user
        String? greeting
    
        command {
            echo "${default="Hello" greeting} ${user}"
        }
        output {
            String text= read_string(stdout())
        }
    }
    
    workflow howdie {
        String user
    
        call hello {
            input: user = user
        }
        output {
            String hi = hello.text
        }
    }
    

    As you can see below, the optional input for the task hello is specified, but not for the workflow howdie that includes the task. I expected to also see an optional input for wf.howdie.hello.greeting. I don't think that would lead to conflicts, since all names are fully qualified.

    $ wdltool inputs main.wdl 
    {
      "wf.user": "String",
      "wf.hello.greeting": "(optional) String?"
    }
    
  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev
    edited May 2017

    I also suspect this is a bug, but I'd also like to throw in my personal style preference.

    My personal preference would be to express this with a passthrough variable, e.g. adding to the main WDL:

    workflow wf {
      String user
      String? greeting
      call share.hello as hello { input:
        user = user
        greeting = greeting
      }
    

    I like this because it means all inputs to wf are declared right there, at the top of wf. Besides being easy to find, the interface to wf can stay the same regardless of its underlying imports changing or being replaced.

  • @ChrisL

    I use that as a workaround now for setting that I really need to modify, but its a lot of duplicate effort, in my opinion. Because every time I change something to the imported workflow (like adding a new parameter), I also have to modify all workflows that use it to add the passthrough variable.

  • @Geraldine_VdAuwera @ChrisL

    I just ran into this issue again, any news on implementing (optional) inputs for sub-workflows in wdl?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I think I heard some chatter around that recently. @ChrisL , any thoughts?

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev

    So we looked into this and the reason for this is that workflow calls encapsulate any inner calls (rather than letting inputs from the JSON trickle all the way down). That has a bunch of benefits including:

    • Workflow calls looking identical to task calls from inputs json.
    • Making it easy to find out what inputs are required for each call, whether workflow or task.
    • Letting far-nested sub-workflows alter their implementation without external callers having to change their inputs JSON, so long as the sub-workflow interface doesn't change.

    The downside (which I realize is annoying) is that you have to have a pass throughs variable per sub-workflow input, I'm hopeful that the structs in WDL draft 3 are going to make that much easier (since they'll let you bundle all the variables together and ingest everything as a single input declaration)

    FWIW the existing bug is that wdltool (or now womtool) should be rejecting the validation rather than creating nested inputs in the file which Cromwell ignores or fails with. I believe @Ruchi is looking into this actually?

  • edited April 13

    Hi @ChrisL and @Geraldine_VdAuwera, is there any chance you will reconsider how inputs to sub-workflows are implemented in wdl?
    I finally got around to try to implement nested workflows, but it really is a nightmare to manually put in all required and optional inputs of a workflow into EVERY other workflow that uses it. For each workflow, you must specify it's own inputs, and all inputs of the nested workflow. Because of this, the number of inputs you have to specify grows exponentially with each nested workflow.

    That is really bad if you want to nest several workflows. For example, I would like to use the trim workflow in my assemble workflow, and use the assemble workflow in some find genes workflow.

    For example, I have a workflow that assembles reads, and it has the following code to specify all optional inputs, and then set the default values for the optional input.

    workflow abyss {
        File inputFastqFile                                                                             
        Int nrCores                                                                                     
        Boolean create_report                                                                           
    
        # Optional workflow inputs
        Boolean? kmerSearch
        Int? kmer_min
        Int? kmer_max                                                                                   
        Int? kmer_increment                                                                             
        Int? kmer
    
        # Optional docker images
        String? trim                                                                                    
        String? abyss
        String? graph
    
        # Set default values for optional inputs                                                        
        Int Kmer=select_first([kmer, 31])
        Boolean KmerSearch=select_first([kmerSearch, false])                                            
        # By default, we will do 10 different kmers                                                     
        Int Kmer_min=select_first([kmer_min,27])                                                        
        Int Kmer_max=select_first([kmer_max,96])                                                        
        Int Kmer_increment=select_first([kmer_increment,7])                                             
    
    

    Now I have to repeat the required inputs, optional inputs and the assignment of the optional inputs, in every workflow that uses abyss. Since I cannot simply pass a Boolean? to the abyss workflow and have it get interpreted correctly, because it throws a coercion error No coercion defined from 'null' of type 'Boolean?' to 'Boolean'.

    Even worse, when I change something in the abyss workflow, I have to update every other workflow that uses it as well. I don't want to do that, I want to just do womtool inputs workflow_with_nested.wdl and have to software figure out which inputs are required for every task. Isn't that the whole point of having fully qualified names?

    Am I going around this in a dumb way? Is there an example .wdl somewhere of some workflows that share a sub workflow with optional inputs, or is that something that isn't really used at the broad? Because this really forces you to turn your wdl scripts into tightly integrated spaghetti code, where it is impossible to change one workflow without breaking all other workflows that use it, forcing you to re-write the interface.

    It's just frustrating that this is exact problem is solved by using fully qualified names, which we already have in cromwell! And even if developers want to manually specify sub-workflow inputs to present their users with a unchanging interface, that will still be possible, since task inputs that are specified in the workflow don't show up in the inputs .json anyway.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @Redmar_van_den_Berg, I haven't done this myself in a while but from the last time I did, I do remember some frustration around dealing with inputs to sub-workflows. I believe there was some chatter about improving the situation/UX around this but I'm not up to date on how that has developed. I'll let @ChrisL address this since he owns all of it now... ALL OF IT no pressure Chris

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev

    Hi @Redmar_van_den_Berg - my team is built around the idea of improving UX and people's experience in Cromwell, so this is certainly something we want to look into (and you're not the first to raise it).

    So I have a few thoughts in defense of the status quo, mainly on behalf of users who want to be able to import and use workflows and not worry about things changing underneath them.
    For non-power users, I think:
    * That encapsulating workflow interfaces so that they look just like tasks is valuable.
    * That as a consumer of somebody else's WDL file, being able to use tasks and workflows interchangeably and rely on a stable interface is valuable.
    * That allowing the internal workings of an imported sub-workflow to change, and not require users to regenerate their inputs (and have to re-work-out what files now have to go where) is valuable.
    * That if I imported a sub-workflow from somewhere, and now how it works has changed enough that its interface has changed... to me that implies that my workflow should have an interest in reconsidering how it's calling the subworkflow (because the logic has probably changed too - is it still valid for me to import and use it?).

    And now my thoughts on how to make life better for power-users like you:
    * We could maybe enable a "relaxed" mode in Cromwell which lets you specify inputs with fully qualified names. This would be with the proviso that Cromwell is not really enforcing the portable language specs, and so inputs would still need to be wired through if you wanted to publish the workflow for others to import.
    * Speaking of that wiring, I also like the idea of leveraging the IDE plugin to do the actual work there for you. Say you make a change in a subworkflow, you could then ask the IDE to wire through inputs automatically for the workflow, saving you a lot of busy work (PS: do you use the plugin in intellij or pycharm? I think that's the obvious place to add a feature like this).
    * Finally, I like the idea of letting people specify "customizations" to somebody else's published workflow at run time, without having to change the underlying WDL itself. Sort of like the "relaxed" mode except aimed at experimenting with somebody else's published workflow instead of iterating while writing your own. This would let people override things like inputs, declarations, runtime attributes in a workflow, even if the author hard-coded something and made a conscious choice to not expose them as inputs.

  • Hi @ChrisL, thank you for your answer.

    • That encapsulating workflow interfaces so that they look just like tasks is valuable.
    • That as a consumer of somebody else's WDL file, being able to use tasks and workflows interchangeably and rely on a stable interface is valuable.

    I agree, but I think that this should be taken even further, so that there is no difference between using a task or a workflow. Right now, if a task input is not specified through a workflow, you have to specify it by it's fully qualified name. However, the same rule does not apply to workflows, where the only way to specify an input at all is through the encapsulating workflow.

    • That allowing the internal workings of an imported sub-workflow to change, and not require users to regenerate their inputs (and have to re-work-out what files now have to go where) is valuable.

    I'm not sure I understand this point. I use womtools inputs workflow.wdl every time I run a workflow, and in that file I update the settings that I need. Every task has it's own defaults that make sense, so I usually only have to specify three or four inputs. What you describe sounds like there are settings that are always the same, but they live in a users input file, instead of as defaults in the tasks. That sounds like a bad thing, but I understand it might be necessary when users with very different requirements use the same workflows.

    • That if I imported a sub-workflow from somewhere, and now how it works has changed enough that its interface has changed... to me that implies that my workflow should have an interest in reconsidering how it's calling the subworkflow (because the logic has probably changed too - is it still valid for me to import and use it?).

    For me, the main changes would be adding some option that used to be hard coded. For example, I might get some reads that have different illumina adapters. If this happens, I want to make the illumina adapter an optional input for every workflow I have, and keep the usual illumina adapters as the default. The easiest way to do this would be to change the trim task all the way at the bottom, and let womtools propagate the new optional input all the way to the inputs json file. For a user that does not care about the adapters, the interface would not change.

    And now my thoughts on how to make life better for power-users like you:
    * We could maybe enable a "relaxed" mode in Cromwell which lets you specify inputs with fully qualified names. This would be with the proviso that Cromwell is not really enforcing the portable language specs, and so inputs would still need to be wired through if you wanted to publish the workflow for others to import.

    Would that not create a cromwell specific dialect of the language? I think that having workflows that are written for only cromwell an no other engine would be bad for the community overall.

    • Finally, I like the idea of letting people specify "customizations" to somebody else's published workflow at run time, without having to change the underlying WDL itself. Sort of like the "relaxed" mode except aimed at experimenting with somebody else's published workflow instead of iterating while writing your own. This would let people override things like inputs, declarations, runtime attributes in a workflow, even if the author hard-coded something and made a conscious choice to not expose them as inputs.

    Your idea about customizations gives me an idea for a solution:
    1. Add a new keyword to womtools, womtools inputs-full that would output the fully qualified names of every possible input, all the way down the workflow chain.
    2. The default womtools inputs keeps working as it does now
    3. Cromwell and the other execution engines should read the fully qualified names from the settings json file, and use those settings, no matter how deeply nested.
    4. Because the behaviour of womtools inputs has not changed, this change is hidden for 'regular' users, and only available for power users.
    5. This also has the advantage of pushing the decision of using fully qualified or workflow specified inputs out of the execution engine, and into a separate tool under control of the user.
    6. There is no need for adding a cromwell specific "relaxed" mode, and every workflow setting is available for any user that wants to access it through the new command.

Sign In or Register to comment.