Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

How fast should call caching be?

I've been playing around with the different call caching alternatives, and it feels like I'm doing something wrong because the fastest I've managed to get a task whose output file is 40GB or more to pass the call caching step is a couple of minutes. Other pipeline languages are essentially instantaneous during the call caching step, is cromwell able to be instantaneous too?

This is my call caching config

localization: [
                    "soft-link", "hard-link", "copy"
                ]

                caching {
                  # When copying a cached result, what type of file duplication should occur. Attempted in the order listed below:
                  duplication-strategy: [
                    "soft-link"
                  ]

                  # Possible values: file, path
                  # "file" will compute an md5 hash of the file content.
                  # "path" will compute an md5 hash of the file path. This strategy will only be effective if the duplication-strategy (above) is set to "soft-link",
                  # in order to allow for the original file path to be hashed.
                  hashing-strategy: "path"

                  # When true, will check if a sibling file with the same name and the .md5 extension exists, and if it does, use the content of this file as a hash.
                  # If false or the md5 does not exist, will proceed with the above-defined hashing strategy.
                  check-sibling-md5: true
                }

The GATK tools that support md5 hashing are set to use that, but is it necessary for fast call caching?

Best Answer

Answers

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    Linking should be essentially instantaneous, so it sounds like Cromwell is actually copying your cached results. You should be able to verify that these are not soft links with ls -l.

    Can you give more info about your configuration? Soft links do not work with Docker tasks, and hard linking does not work across filesystems (Unix filesystem rules).

  • oskarvoskarv BergenMember

    @mcovarr said:
    Linking should be essentially instantaneous, so it sounds like Cromwell is actually copying your cached results. You should be able to verify that these are not soft links with ls -l.

    Can you give more info about your configuration? Soft links do not work with Docker tasks, and hard linking does not work across filesystems (Unix filesystem rules).

    I'm running it in docker so perhaps that's the reason it's so slow? To be more precise, I mount the host directories in docker with -v and read from and write to those directories, cromwell, gatk and bwa are stored in the container for easy portability. Although it's using soft links as verified by "ls -F filename.bam," returning [email protected], indicating a soft link, but that shouldn't be the case?
    Is it possible to make it work fast even though I'm using docker?

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    Cromwell should never use soft links for localization or cache copying in a Docker environment. What version of Cromwell are you using?

  • oskarvoskarv BergenMember
    edited August 2017

    @mcovarr said:
    Cromwell should never use soft links for localization or cache copying in a Docker environment. What version of Cromwell are you using?

    I'm using version 28_2.
    How is cromwell intended to work with docker though? Is it possible to speed it up to near instantaneous call caching?

    Post edited by oskarv on
  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    Can you try using hard linking instead? Also, could you please pass along what those symlinks look like? Thanks.

  • oskarvoskarv BergenMember

    I kid you not, it works now, I don't know what I did, but I'm not changing anything.

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev ✭✭

    Glad to hear that! :smile:

Sign In or Register to comment.