Uploading an Array of files or strings - WDL won't overwrite data model

tmajariantmajarian Member, Broadie

Hi-

Many of my methods rely on arrays of files as inputs from the data model. Normally, I use this wdl to populate the data model:

task read {
    File list
    command {
        ls
    }
    runtime {
        docker: "tmajarian/[email protected]:f3402d7cb7c5ea864044b91cfbdea20ebe98fc1536292be657e05056dbe5e3a4"
    }
    output {
        Array[String] filepaths = read_lines(list)
    }
}

workflow w {
    File list_file
    call read {input: list=list_file}
}

However, it seems that if the entity attribute already has a value, this wdl does not overwrite the current attribute value. Is there a way to force an overwrite? Or possibly a better way to populate the data model with arrays? It is not ideal to run this wdl for every entity and attribute that requires an array.

Thanks!
-Tim

Tagged:

Best Answer

Answers

  • dheimandheiman Member, Broadie ✭✭

    Hi Tim,

    If I need to write arrays to attribute values, I use example 2 from @Tiffany_at_Broad's post https://gatkforums.broadinstitute.org/firecloud/discussion/10738/howto-import-metadata#latest. This avoids spinning up a VM and any associated costs.

    My strategy for when I'm forced to do this (which is thankfully rare, as I can usually use the data model to my advantage - see next paragraph) has been to simply have the file attribute I initially populate with a file listing google bucket locations, and create a second attribute with a config implementing the above method (simply having a separate config for each attribute, though it's conceivable that if you know in advance how many attributes you will be modifying this way you could create a single WDL to do all of them).

    My main strategy is to use the data model as designed (which may or may not apply to your case) - all the files for each sample are attached to their respective attributes on each sample, and then run my configuration on the set entity, using the this.samples.<attribute> nomenclature to automatically generate my Array[File] inputs.

    I don't think there is a way to overwrite an attribute from within a config that uses said attribute, nor a way to write to the same attribute from multiple configs - someone can correct me if I'm wrong.

    I believe it's possible to get around the second limitation by manually deleting the attribute using a tool like FISS

  • tmajariantmajarian Member, Broadie
    edited June 2018

    Thanks @dheiman . So the real problem that I'm running into is that I initially populated the data model using the above method. Everything was fine until a mistake was made. The data model tsv was downloaded, edited, and re-uploaded. Since this tsv contained the array literal strings, it changed the data model entity attributes to literal strings. Now that these have strings instead of arrays, I'd like to overwrite with the above wdl.

    This seems trivially easy but does not seem to be possible.

Sign In or Register to comment.