To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits
LATEST RELEASE: FireCloud's latest release was on February 13th. Release Notes can be found here.

How can a method configuration locate a file generated by WDL method ${write_lines(Array[File])}?

I have a task that creates a file of file paths via the WDL method ${write_lines(Array[File])}. However, upon running, it fails with the error of not being able to find the file that was generated. See submission ID 740bf8f4-12b2-479d-95d8-799c2c207c7d.

Issue · Github
by Geraldine_VdAuwera

Issue Number
1680
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
katevoss

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Can you post the relevant parts of your WDL script? It sounds like the output file may not be correctly identified as an output.
  • dheimandheiman Member, Broadie

    Hi @Geraldine_VdAuwera,

    It never gets to generating output files, this issue is generation of the input. According to the documentation, ${write_lines(mafpaths)} below should generate a file with a list of file paths. Looking in the associated output before the job fails, that file is generated, but for some reason the running task can't find it, and thus fails to even start.

    task aggregate_mafs {
        Array[File]+ mafpaths
        String out_prefix
        String? out_suffix
    
        command {
            python /src/Merge_MAFs.py ${"--suffix " + out_suffix} ${out_prefix} ${write_lines(mafpaths)}
        }
    
        output {
            File aggregated_maf = "${out_prefix}${'.' + out_suffix}.maf"
        }
    
        runtime {
            docker : "broadgdac/aggregate_mafs:2"
        }
    
        parameter_meta {
            mafpaths : "File containing paths to MAF files."
            out_prefix : "Identifying prefix. Usually the set entity this is run on."
            out_suffix : "Descriptive text to precede the .maf suffix."
        }
    
    }
    
    workflow aggregate_mafs_workflow {
        call aggregate_mafs
    }
    

    The method configuration:
    image

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Ah, I see. You want the python command to recognize the file that is dynamically generated by the write_lines function, is that right? I'm not sure that's possible -- I would recommend separating this into two tasks, one to create the file with the list of tasks, then a second to read that in and do the merge. Others may jump in with a more clever and compact approach but I'd rather go with an explicit decomposition.
  • dheimandheiman Member, Broadie

    If that's not possible, then what is the point of the write_lines function? In fact, http://gatkforums.broadinstitute.org/wdl/discussion/8511 gives an explicit example of using it in the same sort of context. You've even given an example of using it with a string array: https://gatkforums.broadinstitute.org/gatk/discussion/7026. In the WDL spec, using write_lines in this way is the example for its use in array serialization: https://software.broadinstitute.org/wdl/devzone.

  • dheimandheiman Member, Broadie

    I think part of what I'm getting at here, is that if there is not a way to do this in FireCloud, then it is a bug, and should be raised as such. Automatic serialization of an array of filepaths into a file was a major feature in Firehose, as an extremely common use-case is to combine files from multiple samples in order to do contextual analysis (e.g. clustering, mutation significance, etc.), and going by the documentation, the use of ${write_lines(<Array[File]>)} in a command should be the WDL/FireCloud/Cromwell equivalent.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Alright, I might be wrong since it's a use case I'm not familiar with -- will check with the dev team.
  • KateVossKateVoss Cambridge, MAMember, Broadie, Moderator

    @dheiman I have passed this along to the Cromwell development team, thank you for raising this use case to our attention. You can track progress on the bug that you reported in the Cromwell repo here: https://github.com/broadinstitute/cromwell/issues/1906

  • dheimandheiman Member, Broadie

    @KateVoss, thank you, however the bug I reported in the Cromwell repo is only related to this issue - in that bug, the file is found, but the listed files have the wrong paths. In this case, the file isn't even found. Even testing Cromwell v21 locally, the file is found, so I'm fairly sure this is a FireCloud issue, not Cromwell.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    So your github issue is about the behavior of write_lines when executing locally, while this question is about its behavior when executing in FireCloud, is that right? If so I wouldn't be surprised if the two are linked -- clearly there's a problem with finding/localizing the file that is produced. I'm not sure we can address the FireCloud behavior until we've sorted out the local execution behavior, but we'll check with the team.

  • dheimandheiman Member, Broadie

    @Geraldine_VdAuwera, locally, there is no issue finding/localizing the produced file - the issue is that the file is populated with the pre-localized paths; the tool runs and opens the file of filepaths, but when it tries to open one of the listed files, it fails because the path is wrong. The FireCloud behavior is that it cannot find the file of filepaths in the first place. Two very different issues that sound similar at first glance. It wouldn't surprise me if it turns out that the Cromwell issue crops up once the FireCloud issue is resolved, but it would surprise me if they are related to each other. It strikes me as much more likely that the FireCloud issue may be related to peculiarities with Cromwell's JES backend.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Oh I see. Well, we're having the team look at this so we'll find out either way.

  • dheimandheiman Member, Broadie

    Is there a ticket I can follow for this issue? The one @KateVoss linked above is the separate Cromwell issue I reported previously. Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    We'll check with the Cromwell team and get back to you. Sorry for the lag, we have a workshop this week so we're a bit short-handed.
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    edited March 2017

    @dheiman, we've reproduced your bug and provided a reproducible test case to the FireCloud dev team. We'll update this thread when there is progress. It's relatively low priority since it's possible to work around the bug, but we'll make sure this functionality gets fixed.

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @dheiman I hear that it's possible the problem is with File types and if you use String types instead, it should work. I'll need to test this when I can next week but if you have the opportunity to try this out, please let me know how it goes.

  • dheimandheiman Member, Broadie

    Hi @Geraldine_VdAuwera, the issue is when done as documented, the file is written to the working directory rather than the input directory, and thus is not found because the path resolves incorrectly, at least that is what I gather from having talked to @LeeTL1220 ( I may have gotten it backwards).

    Looking in the bucket, it is interesting to note that two wdlarray*.tmp files are written, one with the google bucket paths to the input files, and the other with the localized paths, though looking at the logs, the command attempted to use the bucket paths. I suspect that a string array would fail in the same way.

    It's frustrating that when done locally with Cromwell, the file IS found with the same WDL configuration, but unfortunately without the localized paths.

    Trying out a suggestion from @LeeTL1220, I modified my WDL as follows:

    task aggregate_mafs {
        Array[File]+ mafpaths
        File mafpathsfile = write_lines(mafpaths)
        String out_prefix
        String? out_suffix
    
        command {
            /src/Merge_MAFs.py ${"--suffix " + out_suffix} ${out_prefix} ${mafpathsfile}
        }
    

    By moving write_lines to my inputs section, the command did find the generated file, but unfortunately it was google bucket paths rather than localized paths, thus bringing us full-circle back to the Cromwell bug I reported separately, which it turns out is a docker-only bug.

    Issue · Github
    by Geraldine_VdAuwera

    Issue Number
    1872
    State
    closed
    Last Updated
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Oh I see, thanks for reporting back with these details. We'll make sure to update the usage documentation. For the rest I don't think we can do anything about it until the Cromwell bug is fixed. We'll see what we can do to get that addressed.

  • gordon123gordon123 BroadMember, Broadie
    edited March 2017

    regarding differences in input file paths between local and server based Cromwell - the localize_files.py in the firecloud_developer_toolkit may address that by putting every input file into a separate directory, like how the server does it, to catch this sort of issue during early debug.

Sign In or Register to comment.