To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

'Complex' json structure in the config file: how to coerse the data ?

Hi all, learning (again) cromwell. I'm trying to specify a 'complex' json structure in my config .

{
"wl.samples":[
    {"name":"S1","fastqs":[["S1.R1.a.fq.gz","S1.R2.a.fq.gz"],["S1.R1.b.fq.gz","S1.R2.b.fq.gz"]]},
    {"name":"S2","fastqs":[["S2.R1.a.fq.gz","S2.R2.a.fq.gz"]]}
    ]
}

and, for a start, I want to print the name of each sample.

task echo_sample_name {
    Map[String,Object] sample

    command {
        echo ${sample['name']}
        }
    }

workflow wl {
    Array[Map[String,Object]] samples

    scatter(sample in samples) {
        call echo_sample_name {input:sample=sample  }
        }
    }

however, can cannot find a good way to coerse the json data to something I can use in my task. Currently I've got a:

Could not coerce JsArray value for 'wl.samples' ([{"name":"S1","fastqs":[["S1.R1.a.fq.gz","S1.R2.a.fq.gz"],["S1.R1.b.fq.gz","S1.R2.b.fq.gz"]]},{"name":"S2","fastqs":[["S2.R1.a.fq.gz","S2.R2.a.fq.gz"]]}]) into: WdlMaybeEmptyArrayType(WdlMapType(WdlStringType,WdlObjectType))

I've tried to use ̀Array[Map[String,Object]]` and other things without success.

What is the proper way to do this ?
Thanks !

Best Answer

  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev
    Accepted Answer

    Hi @lindenb I think you might be able to, the syntax error you're seeing is because "S1" is a simple String, not an Object. You should be able to make it work if your input is Array[Object] instead, eg:

    workflow wl {
      Array[Object] samples
      scatter(sample in samples) {
        String name = sample["name"]
        Array[Array[String]] fastqs = sample["fastqs"]
      }
    }
    

    But, because Object has historically been neglected by the development team (and because it's likely to be refactored soon, I would probably advise you against it unless absolutely necessary!)

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev

    Hi @lindenb - I'd avoid using Object to be honest, it's a part of WDL that wasn't really implemented properly in Cromwell and so probably has a few rough edges.

    In your particular case, I would use something like Array[Pair[String, Array[Array[String]]]] to represent the data type.

    eg with the inputs file:

    {
      "wl.samples":[
        {
          "left":"S1",
          "right": [["S1.R1.a.fq.gz","S1.R2.a.fq.gz"],["S1.R1.b.fq.gz","S1.R2.b.fq.gz"]]
        },
        {
          "left":"S2",
          "right": [["S2.R1.a.fq.gz","S2.R2.a.fq.gz"]]
        }
      ]
    }
    
  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev

    As a side note, we're hoping to revamp the dusty Object spec with a more usable struct format in the near future, to make this a bit easier. For more details you can check out: https://github.com/broadinstitute/cromwell/issues/2283

  • lindenblindenb FranceMember

    @ChrisL thank you for your answer. So, as far as I understand, the is currently no way to put whatever-I-want in a mixed-json config isn't it ? for example:

    {
    "name":"Sample1",
    "age":10,
    "male":false,
    "father":null,
    "mother":null,
    "fastqs":[["S1.R1.a.fq.gz","S1.R2.a.fq.gz"]]
    }
    
  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev
    Accepted Answer

    Hi @lindenb I think you might be able to, the syntax error you're seeing is because "S1" is a simple String, not an Object. You should be able to make it work if your input is Array[Object] instead, eg:

    workflow wl {
      Array[Object] samples
      scatter(sample in samples) {
        String name = sample["name"]
        Array[Array[String]] fastqs = sample["fastqs"]
      }
    }
    

    But, because Object has historically been neglected by the development team (and because it's likely to be refactored soon, I would probably advise you against it unless absolutely necessary!)

  • lindenblindenb FranceMember

    @ChrisL sorry, in the end, your solution doesn't work:

    tested with : https://github.com/lindenb/wdl-sandbox/tree/master/wdl/test007

    [2017-10-13 17:32:42,99] [error] WorkflowManagerActor Workflow b870989d-3d8f-4fca-9cef-493df859ba2a failed (during MaterializingWorkflowDescriptorState): Workflow input processing failed:
    Could not coerce JsArray value for 'wl.samples' ([{"name":"S1","fastqs":[["S1.R1.a.fq.gz","S1.R2.a.fq.gz"],["S1.R1.b.fq.gz","S1.R2.b.fq.gz"]]},{"name":"S2","fastqs":[["S2.R1.a.fq.gz","S2.R2.a.fq.gz"]]}]) into: WdlMaybeEmptyArrayType(WdlObjectType)
    cromwell.engine.workflow.lifecycle.MaterializeWorkflowDescriptorActor$$anon$1: Workflow input processing failed:
    Could not coerce JsArray value for 'wl.samples' ([{"name":"S1","fastqs":[["S1.R1.a.fq.gz","S1.R2.a.fq.gz"],["S1.R1.b.fq.gz","S1.R2.b.fq.gz"]]},{"name":"S2","fastqs":[["S2.R1.a.fq.gz","S2.R2.a.fq.gz"]]}]) into: WdlMaybeEmptyArrayType(WdlObjectType)
        at cromwell.engine.workflow.lifecycle.MaterializeWorkflowDescriptorActor.cromwell$engine$workflow$lifecycle$MaterializeWorkflowDescriptorActor$$workflowInitializationFailed(MaterializeWorkflowDescriptorActor.scala:186)
        at cromwell.engine.workflow.lifecycle.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.scala:156)
    

    P.

  • @lindenb there is an issue floating around somewhere (im on mobile or I would find it) which hopes to address this. The general gist of the issue is whether wdl should support an complex type such as Struct, where users can define their own typed objects.

    This seems like it would be a good example for how additional data structures like a struct would be used. When I find the issue I will link it back. If this is something that you think would help you I suggest you +1 that idea!

  • lindenblindenb FranceMember

    @patmagee thank you. So, as far as I understand, this is still a feature-request but there is currently no way to embed an unstructured json object in the config file.

Sign In or Register to comment.