message: Unable to complete JES Api Request; message: Request payload size exceeds t

When running a call to combine multiple files into one using a python script I see the following error:
• message: Unable to complete JES Api Request
• causedBy:
• message: Request payload size exceeds the limit: 5242880 bytes.

This happens only to for calls that require a large number of files to be merged, although I adjust disk and memory requirements accordingly.
Thanks!
Dmitry from the Broad

Answers

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @dmitry_s Would you be able to share your workspace with [email protected] so that we can take a closer look?

  • dmitry_sdmitry_s Member

    Sure, it has already been shared with your team!

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @dmitry_s It seems that the command that is sent to Pipelines API is very long due to the list of input files and therefore resulting in the intermittent payload error that you are seeing. Currently you are using Array[File] - passing in an array of all your tsv files to combine. To workaround, you can write a text file, for example "list_tsv_files.txt" that contains file names of the .tsv files you wish to combine and write your WDL to read the text file in and combine its contents.

  • SaloniShahSaloniShah Member

    @dmitry_s out of interest, how long have you been running this workflow task where you combine large files? If this is not the first time running it, have you come across this error before?

  • dplichtadplichta Member

    Hi @SChaluvadi,

    Could you be more specific about how to re-write it in a single WDL?

    In another workflow I tried a following trick, but failed:

    task make_matrix {
    Array[String] manyFilesFromPreviousTask

    Array[File] files = manyFilesFromPreviousTask
    
        paste ${sep=' ' files} > combined_files.txt
    
        XXX
    

    }

  • dplichtadplichta Member

    I made a hack with passing the Array[String] and saving addresses to a file and then using gsutil cp command. It works, but I would like to know:

    1) Is the "Request payload size exceeds the limit: 5242880 bytes." considered a bug?
    2) If not, why? How do others do this kind of analysis? It's difficult for me to know what is the limit of files that I pass to Array[File] that will break the pipeline with "payload size exceeds..." error.
    3) What is the recommended solution for this issue?

    Best,

    Damian

    Code example:

    task xxx {
    Array[String] manyFilesFromPreviousTask
    
    command <<<
    cat ${write_lines(manyFilesFromPreviousTask)} > manyFilesFromPreviousTask_2_download.txt
    mkdir dir_manyFilesFromPreviousTask
    cat manyFilesFromPreviousTask_2_download.txt | gsutil -m cp -I dir_manyFilesFromPreviousTask/
    
    paste dir_manyFilesFromPreviousTask/* > combined_files.txt
    >>>
    }
    
  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @dplichta Sorry for the delay - I am looking into the details and will follow up with more information.

  • dplichtadplichta Member

    Hi @SChaluvadi - any updates on this?

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @dplichta Sorry for the delay! We believe that this is an issue from PAPI v1 and we expect that this is handled much better in PAPI v2 but we would like to test this internally if you are willing to let us experiment. We would use a billing-project on our end that has PAPI v2 enabled and test ahead of helping you move your work to v2. Additionally, PAPI v2 is going out in our next release this week if you would like to wait and test for yourself in just a few days.

Sign In or Register to comment.