Latest Release: 05/01/19
Release Notes can be found here.

Failure in firecloud despite success locally

Hi--

I have a method that succeeds locally yet fails when run in FireCloud (discussed this with @Ruchi at Fc office hours).

I wrote the docker file being used in this method (image available here: tmajarian/epacts-mkl-dstat:latest, see below for dockerfile) but have only been able to get it running locally. See the wdl below for a simple test case that succeeds locally.

task test_epacts {

    command {       
        /tmp2/EPACTS-3.2.6/bin/test_run_epacts.sh
    }

    runtime {
        docker: "tmajarian/[email protected]:0533a755da58883de35beb801db382015167c122fa3eecf126d1073e986d1530"
        memory: "10 GB"
        disks: "local-disk 20 SSD"
        bootDiskSizeGb: 10
    }

  output {
        Array[File] out = glob("**/*.pdf")
    }
}

workflow w {
    call test_epacts
}

I am currently getting this failure from FC:

message: Task w.test_epacts:NA:1 failed. JES error code 5. Message: 10: Failed to delocalize files: failed to copy the following files: "/mnt/local-disk/glob-444572a90afc2dab3b2d14c9b0564dcd.list -> gs://fc-98b9dbc7-6dcf-493d-b4fc-cd3dcb844b1f/acd13331-7fb1-4e8a-a725-cc130569e24d/w/99f3ee14-9036-47f3-b0ce-cec301b841d2/call-test_epacts/glob-444572a90afc2dab3b2d14c9b0564dcd.list (cp failed: gsutil -q -m cp -L /var/log/google-genomics/out.log /mnt/local-disk/glob-444572a90afc2dab3b2d14c9b0564dcd.list gs://fc-98b9dbc7-6dcf-493d-b4fc-cd3dcb844b1f/acd13331-7fb1-4e8a-a725-cc130569e24d/w/99f3ee14-9036-47f3-b0ce-cec301b841d2/call-test_epacts/glob-444572a90afc2dab3b2d14c9b0564dcd.list, command failed: CommandException: No URLs matched: /mnt/local-disk/glob-444572a90afc2dab3b2d14c9b0564dcd.list\nCommandException: 1 file/object could not be transferred.\n); /mnt/local-disk/test_epacts-rc.txt -> gs://fc-98b9dbc7-6dcf-493d-b4fc-cd3dcb844b1f/acd13331-7fb1-4e8a-a725-cc130569e24d/w/99f3ee14-9036-47f3-b0ce-cec301b841d2/call-test_epacts/test_epacts-rc.txt (cp failed: gsutil -q -m cp -L /var/log/google-genomics/out.log /mnt/local-disk/test_epacts-rc.txt gs://fc-98b9dbc7-6dcf-493d-b4fc-cd3dcb844b1f/acd13331-7fb1-4e8a-a725-cc130569e24d/w/99f3ee14-9036-47f3-b0ce-cec301b841d2/call-test_epacts/test_epacts-rc.txt, command failed: CommandException: No URLs matched: /mnt/local-disk/test_epacts-rc.txt\nCommandException: 1 file/object could not be transferred.\n); /mnt/local-disk/glob-444572a90afc2dab3b2d14c9b0564dcd/* -> gs://fc-98b9dbc7-6dcf-493d-b4fc-cd3dcb844b1f/acd13331-7fb1-4e8a-a725-cc130569e24d/w/99f3ee14-9036-47f3-b0ce-cec301b841d2/call-test_epacts/glob-444572a90afc2dab3b2d14c9b0564dcd/ (cp failed: gsutil -q -m cp -L /var/log/google-genomics/out.log /mnt/local-disk/glob-444572a90afc2dab3b2d14c9b0564dcd/* gs://fc-98b9dbc7-6dcf-493d-b4fc-cd3dcb844b1f/acd13331-7fb1-4e8a-a725-cc130569e24d/w/99f3ee14-9036-47f3-b0ce-cec301b841d2/call-test_epacts/glob-444572a90afc2dab3b2d14c9b0564dcd/, command failed: CommandException: No URLs matched: /mnt/local-disk/glob-444572a90afc2dab3b2d14c9b0564dcd/*\nCommandException: 1 file/object could not be transferred.\n)"

Both stderr and stdout are empty from the firecloud job, yet they are present locally. See attached for the local stdout, full jes log, and dockerfile.

I have already gone through disk issues in a similar wdl but the docker image does include dstat. Any ideas would be awesome! Thanks.

Best Answers

Answers

  • esalinasesalinas BroadMember, Broadie ✭✭✭
    edited September 2017

    @tmajarian

    You reference output files as

    glob("**/*.pdf")
    

    I first see two stars, not one. I'm not sure that will work...

    Do the output PDFs even exist to be de-localized? Can you add a "find" statement to show all files and/or outputs present and to confirm their existence (or non-existence)? If the outputs are non-existent in the first place, then delocalization will fail.

    Are you sure you have enough disk space to hold outputs?

    You mention dstat, do you get its output and confirm to have sufficient disk space?

    Have you tried setting your GB disk to higher values (e.g. 500 instead of 10, 20)?

  • esalinasesalinas BroadMember, Broadie ✭✭✭
    edited September 2017

    @tmajarian

    Consider making a "dummy" PDF (by for example "touch dummy.pdf") and then issue

    tar -cvzf all.pdf.tgz `find .  -iname "*.pdf"`
    

    instead of the current output enumeration strategy? The purpose of the "dummy" is to ensure existence of at least one file with extension PDF so the tar cmd doesn't error saying "no files found"

    Then in the output block have something like

    File myPDFs="all.pdf.tgz"
    
  • tmajariantmajarian Member, Broadie

    @esalinas

    Running what you suggest locally works. However, it will fails in Firecloud with the same error, could not delocalize all.pdf.tgz. I've attached the jes log and workflow logs. Here's the wdl:

    task test_epacts {
        command {       
            /tmp2/EPACTS-3.2.6/bin/test_run_epacts.sh
            touch dummy.pdf
            tar -cvzf all.pdf.tgz `find . -iname "*.pdf"`
        }
    
        runtime {
            docker: "tmajarian/[email protected]:0533a755da58883de35beb801db382015167c122fa3eecf126d1073e986d1530"
            memory: "12 GB"
            disks: "local-disk 50 SSD"
            bootDiskSizeGb: 50
        }
    
      output {
            File out = "all.pdf.tgz"
        }
    }
    
    workflow w {
        call test_epacts
    }
    
  • esalinasesalinas BroadMember, Broadie ✭✭✭
    edited September 2017

    hi @tmajarian

    I was able to replicate your issue in FC

    one thing I did was vastly simplify the command block and make it just

    echo "hello world"
    

    and that failed similarly. The stderr and stdout logs were both empty and the "couldn't find rc file" error appeared similarly.

    This way, "hello world" won't work with the docker image.

    this makes me suspect that the issue somehow related to the docker image or docker.

    Do you have the Dockerfile?

    -eddie

  • esalinasesalinas BroadMember, Broadie ✭✭✭

    I am wondering/speculating if some docker image/build version or kernel version or something along those lines is letting the docker image run okay locally for you but have some incompatibility with google cloud....but I speculate and maybe the issue is not so complex.

    -eddie

  • esalinasesalinas BroadMember, Broadie ✭✭✭

    by the way, I did run a successful "hello world" on r-base:3.3.0 and it ran okay.

    If you have ENTRYPOINT and CMD in your dockerfile, what if you remove both them from the dockerfile? Try that?

    -eddie

  • tmajariantmajarian Member, Broadie

    @esalinas --

    Thanks! Not sure what the actual problem was but building an image off of your dockerfile (with some changes) seems to have worked.

  • tmajariantmajarian Member, Broadie

    @Ruchi
    Thanks for the explanation!

Sign In or Register to comment.