We've moved!
You can find our new documentation site and support forum for posting questions here.

failure when running on large pair set

gordon123gordon123 BroadMember, Broadie

I launched a job on a sample set that aggregates a file on each sample to one big file on the sample set. The sample set has about 10,000 samples, and this resulted in the docker run command line being about 3MB long. The JES log shows this error after the docker run command:

2017/08/09 20:30:27 E: command start failed: (fork/exec /usr/bin/docker: argument list too long)

It looks like every input file is named individually on the docker run command line.

The failing submission is 2934ffda-4655-4633-bca6-cf9bfc9b6c19

Answers

  • gordon123gordon123 BroadMember, Broadie

    FYI - a smaller job, with a docker run command of just over 1.5MB, does pass.

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Hey @gordon123, what workspace is this submission in? I'd like to look a little deeper into the JES logs if that's okay.

  • gordon123gordon123 BroadMember, Broadie

    It is in:

    nci-gsaksena-bi-org/MC3_mutation_validator

    I've shared the workspace with [email protected]

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Seems like this workspace has protected TCGA data and has restricted access.

    Would you mind attaching the JES log along with the exec file for this particular job?

  • gordon123gordon123 BroadMember, Broadie

    @Geraldine_VdAuwera The attachment button appears broken in Chrome (Windows10, Mac, Linux).
    Ruchi - I'll email you a filesystem path.
    Thanks,
    Gordon

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @gordon123 Yes we have a bug in the forum code -- we have work planned to fix it tomorrow. Sorry for the inconvenience.

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Hey @gordon123 we don't control the docker run command and how it's built for the Google backend. It seems like there is some limit on how long the docker command line is allowed to be and you have 10,000 inputs. Is it possible for you to make a file of file names instead? Do you need all 10,000 files to be localized or can your tool for aggregation handle the gcs files directly?

  • gordon123gordon123 BroadMember, Broadie

    My job runs on a pairset that happens to be large. While Firecloud could generate a file of filenames to convey to the container which files were localized, it looks like it passes them in individually. I can't think of any way to generate a file of file names aside from creating the table outside of Firecloud, and/or layering my own data model on top of Firecloud's.

    This particular tool is pretty simple, a Python script that concatenates tables in a column-name aware way. But, while the tool could be modified to access the bucket via NIO, for now I can work around it by chunking the work into smaller sections. More important is fixing the underlying scaling limitation that affects the usefulness of entity sets.

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Absolutely agree that this is an important scaling limitation. Currently there are only two real options: 1. Reduce the number of inputs per task (as you've done) 2. Generate a file of file names, which can be done inside the WDL itslef by using the write_lines() function (https://github.com/broadinstitute/wdl/blob/develop/SPEC.md#file-write_linesarraystring). I'll file an issue for this, thanks!

  • gordon123gordon123 BroadMember, Broadie

    I think I am already doing something equivalent to #2 (see below). Before I changed to this approach, the job was crashing with even fewer samples, running into a 1MB limit on the commandline length in the python tsvConcatListFile.py command. Now the crash is in the docker run commandline length, which I don't call myself.

        String pSetID
        Array[File]+ mafs
    
        command <<<<
           python <<CODE
    NULL_SENTINEL = "GDAC_FC_NULL"
    mafpaths = '${sep="," mafs}'.split(",")
    with open("datapathsfile.tsv", "w") as fout:
        for mafpath in mafpaths:
            if not mafpath.endswith(NULL_SENTINEL):
                fout.write("NA\t%s\n"%mafpath)
    CODE
        #run catters
        python /usr/local/bin/tsvConcatListFile.py datapathsfile.tsv ${pSetID}.aggregated.maf
    
        >>>>
    
  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Hey @gordon123, I'll be filing a ticket to reduce the docker command length from the Google side. If you're facing a limitation from python for writing out the file paths to a file, using write_lines(Array[File]) should help, with the added bonus of doing the work without needing to spin up a VM!

  • gordon123gordon123 BroadMember, Broadie

    The python code I pasted works ok. write_lines should be helpful for making things more concise, at least for those cases I do not need to filter out dummy null files. (The motivation for having null files to begin with is a workaround, explained here: https://gatkforums.broadinstitute.org/firecloud/discussion/comment/39981#Comment_39981)

    For the crash I posted about, the python code never got a chance to run... the command line length limitation is hit upstream from the command block, at the time of starting the docker container. I don't think there is anything I can put in my WDL to change this. The only workaround I know of is to keep pairsets smaller than 7000 or so, and log this as a scaling issue.

  • dheimandheiman Member, Broadie ✭✭

    I wonder if the issue is the expansion of ${sep="," mafs} during the first pass, before the python is run.

    If so, a possible workaround may be to use write_lines(Array[File]), and modify tsvConcatListFile.py to filter out the placeholder files.

Sign In or Register to comment.