method/workflow attributes and default configurations

bhaasbhaas Broad InstituteMember, Broadie

Some of my workflows are leveraging the 'workspace.attribute' values and these values point to required resources for that method (ex. indexed genomes or other resource data). Instead of needing to set these attributes up in each of the workspaces where the method would be invoked, it would be great to be able to just attach those attributes to the method itself.

Alternatively, perhaps the resource locations 'gs://bucket' could be embedded directly into the wdl itself. Would that be the best way to do it for now? It would avoid the need to create the workspace attribute.

I was also wondering whether it would be possible to create default method configurations at the method level instead of in the workspace, since certain bindings like 'this.sample_id' could be fairly standard for certain methods or workflows. Currently, it seems, when a method is imported into a workspace, one must go through the parameter binding configuration each time.

many thanks!

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    Hi @bhaas, just about all the things you mention are possible. I would not recommend hardcoding paths in your WDL, as that makes it more painful to update versions, but you can do it if you like. You can also stick your reference etc. in your method configuration if you want, so that wherever you run it, it will always have them already bound -- just put in the gs://bucket locations in the appropriate fields, in double quotes (important: this is different from ws attributes where we do not use quotes).

    However, the drawback of that strategy is that if you want to run a different method with the same resources, you have to redo all that configuration for the new method. And if you want to update to using a new reference or resource, you have to update all your method configurations individually. In contrast, if you use a convention like this.ref in all your method configs, they'll automatically pick up the reference that is set up in any workspace that uses the same conventions, and if you update the version of a resource used by a workspace, the update propagates to all method configs in that workspace.

    What we typically do is set up "seed" workspaces that contain all the resources for a particular domain (eg somatic variant analysis) bound to attributes named according to such a convention, so whenever we want to run a new project of that type we just need to clone the appropriate workspace and everything is already set up. You can see this strategy in action in this workspace, which inherits its attributes and data model from a seed workspace (although I can't point you to that one as it is not yet public). We still have some work to do toward proposing a standardized naming convention for the main use cases we cover, and we welcome suggestions for an ontology of terms that might make sense for a large portion of the user community. It may be difficult to standardize across everyone's projects, but we aim to promote consistency within projects and groups at least.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Hi @bhaas, just about all the things you mention are possible. I would not recommend hardcoding paths in your WDL, as that makes it more painful to update versions, but you can do it if you like. You can also stick your reference etc. in your method configuration if you want, so that wherever you run it, it will always have them already bound -- just put in the gs://bucket locations in the appropriate fields, in double quotes (important: this is different from ws attributes where we do not use quotes).

    However, the drawback of that strategy is that if you want to run a different method with the same resources, you have to redo all that configuration for the new method. And if you want to update to using a new reference or resource, you have to update all your method configurations individually. In contrast, if you use a convention like this.ref in all your method configs, they'll automatically pick up the reference that is set up in any workspace that uses the same conventions, and if you update the version of a resource used by a workspace, the update propagates to all method configs in that workspace.

    What we typically do is set up "seed" workspaces that contain all the resources for a particular domain (eg somatic variant analysis) bound to attributes named according to such a convention, so whenever we want to run a new project of that type we just need to clone the appropriate workspace and everything is already set up. You can see this strategy in action in this workspace, which inherits its attributes and data model from a seed workspace (although I can't point you to that one as it is not yet public). We still have some work to do toward proposing a standardized naming convention for the main use cases we cover, and we welcome suggestions for an ontology of terms that might make sense for a large portion of the user community. It may be difficult to standardize across everyone's projects, but we aim to promote consistency within projects and groups at least.

Sign In or Register to comment.