Support for singularity

Currently at University of Adelaide our HPC only supports singularity (including dockers running via singularity). I guess main reason for not supporting from HPC team is root access. Can you please suggest:

how can we make cromwell work with our HPC with singularity or any other container running engines without root?
Is there anyway we can support dockers using cromwell + singularity

I saw this ticket broadinstitute/cromwell#2177.

Is there any update on this? Some people mentioned udocker https://github.com/indigo-dc/udocker.

Answers

  • abdulraufabdulrauf Member

    Please find attached pdf document which is summary of what we are trying to achieve in this project at University of Adelaide.

    I need your help to validate whether we are using cromwell in right way?

    Currently I'm trying to answer following questions:

    Phoenix HPC - Phoenix is university own HPC which uses SLURM as job manager.

    How to make SBATCH works on Cromwell servers?
    How to make docker based workflows work on phoenix using singularity?
    How to manage Phoenix outage? [48 hours every quarter for scheduled maintenance]

    Decide which workflow language we need to support?
    Based on open source community? What's most commonly used in UoA? or is that does not matter as workflows can be easily converted using conversion tools like https://github.com/adamstruck/cwl2wdl https://github.com/common-workflow-language/wdl2cwl
    Common Workflow Language (CWL)
    Workflow Description Language (WDL)
    or Both

    3rd party Tools suggestions for business to develop workflows (CWL/WDL).

    As you can see I got lot's of questions :smile: Is there any way we can schedule a video call with a time of your convenience.

    Thanks
    Abdul

  • myourshawmyourshaw University of ColoradoMember

    Here is a cromwell singularity image definition that we are using successfully at University of Colorado Anschutz:

    # Copyright (c) 2017-2018, Carolyn Lawrence and Michael Yourshaw. All rights reserved.
    #
    # Uses CentOS as base OS
    
    # specify that local yum should be used to get & install base OS packages
    # (release & coreutils rpms installed)
    
    BootStrap: yum
    OSVersion: 7
    MirrorURL: http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearch/
    
    Include: yum
    
    # If you want the updates (available at the bootstrap date) to be installed
    # inside the container during the bootstrap instead of the General Availability
    # point release (7.x) then uncomment the following line
    UpdateURL: http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/updates/$basearch/
    
    
    ##############################  These files must be in INCLUDE_DIR   ###############################
    #                                                                                                   #
    #   - cromwell-${CROMWELL_VERSION}.jar                                                              #
    #   - womtool-${CROMWELL_VERSION}.jar                                                               #
    #   - jre-${JRE_VERSION}-linux-x64.rpm or jdk-${JDK_VERSION}-linux-x64.rpm                          #
    #            (http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html)  #
    #   - cromwell_slurm_cmco-sys-dev-web_${CROMWELL_SLURM_CONF_VERSION}.conf                           #
    #                                                                                                   #
    #####################################################################################################
    
    
    %setup
      # this directory is on the machine where the image will be built
      INCLUDE_DIR=/mnt/hdd/cmoco-sys-dev/cmoco-sys-dev-web_VM/cromwell/singularity/include
      TEMP_DIR=${SINGULARITY_ROOTFS}/tmp/singularity-tmp-cromwell
      rm -rf ${TEMP_DIR}
      mkdir -p ${TEMP_DIR}
      cp ${INCLUDE_DIR}/* ${TEMP_DIR}/
    
    
    %help
    This a definition file for Singularity 2.4.6, cromwell 31, openjdk yum latest (1.8.0.161-0), cromwell_slurm_cmco-sys-dev-web.conf 2018-04-10.
    The image will be used to run the cromwell service on a VM.
    
    Before building the container, these files must have been downloaded manually to
    `/mnt/hdd/cmoco-sys-dev/cmoco-sys-dev-web_VM/cromwell/singularity/include`:
        - cromwell-${CROMWELL_VERSION}.jar
        - womtool-${CROMWELL_VERSION}.jar
        - jre-${JRE_VERSION}-linux-x64.rpm or jdk-${JDK_VERSION}-linux-x64.rpm
                 (http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html)
        - cromwell_slurm_cmco-sys-dev-web_${CROMWELL_SLURM_CONF_VERSION}.conf
    
    To run apps:
      cromwell: singularity run Cromwell.simg
      womtool: singularity run --app womtool Cromwell.simg <action> <parameters>
    
    
    %labels
      AUTHOR [email protected]
      Singularity 2.4.6
      Java 8u162
      cromwell 31
      cromwell_slurm_cmco-sys-dev-web 2018-04-10
    
    
    %post
    
      TMP_DIR=/tmp/singularity-tmp-cromwell
      CROMWELL_APP_DIR=/opt/cromwell
      # bind point for /gpfs/share/*/nfs/storage
      # this is necessary to replicate your current working directory below nfs 
      GPFS_DIR=/gpfs
      HOME_DIR=/homelink
    
      # create directories in the container
      echo "Making ${TMP_DIR} temporary directory to hold downloaded installation files"
      mkdir -p ${TMP_DIR}
    
      echo "Making ${CROMWELL_APP_DIR} directory to hold cromwell.jar and config file"
      mkdir -p ${CROMWELL_APP_DIR}
    
      echo "Making ${GPFS_DIR} as a bind point for /gpfs"
      mkdir -p ${GPFS_DIR}
    
      echo "Making ${HOME_DIR} as a bind point for /homelink"
      mkdir -p ${HOME_DIR}
    
      echo "Installing additional packages"
      yum -y install less which openssh-clients openssl
    
    
      #############
      # OPENJDK 8 #
      #############
      OPENJDK_VERSION=1.8.0
    
      echo " Installing open JDK ${OPENJDK_VERSION}"
      cd ${TEMP_DIR}
      yum install -y java-1.8.0-openjdk
    
    
      ##########
      # JAVA 8 #
      ##########
    
      # JRE_VERSION=8u162
      # JDK_VERSION=8u162
      # JRE_ALT_VERSION=1.8.0_162
    
      # echo " Installing Java"
      # cd ${TMP_DIR}
    
      # echo "Installing Oracle Java 8 JRE"
      # yum -y localinstall jre-${JRE_VERSION}-linux-x64.rpm \
      # && rm -f jre-${JRE_VERSION}-linux-x64.rpm
    
      # # echo "Installing Oracle Java 8 JDK"
      # yum -y localinstall jdk-${JDK_VERSION}-linux-x64.rpm \
      # && rm -f jdk-${JDK_VERSION}-linux-x64.rpm
    
      # echo "Configuring Oracle java as default"
      # # uncomment next line to see the java alternatives
      # # update-alternatives --config java
      # alternatives --set java /usr/java/jre${JRE_ALT_VERSION}/bin/java
    
      # formerly downloaded java on the fly
      # wget --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jre-8u131-linux-x64.rpm \
      # && rpm -Uvh jre-8u131-linux-x64.rpm \
      # && rm -f jre-8u131-linux-x64.rpm
    
    
      ##########
      # JAVA 9 #
      ##########
    
      # JRE_VERSION=9.0.4
      # JDK_VERSION=9.0.4
      # # JREL_VERSION=9.0.4
    
      # echo " Installing Java"
      # cd ${TMP_DIR}
    
      # echo "Installing Oracle Java 9 JRE"
      # yum -y localinstall jre-${JRE_VERSION}_linux-x64_bin.rpm \
      # && rm -f jre-${JRE_VERSION}_linux-x64_bin.rpm
    
      # echo "Installing Oracle Java 9 JDK"
      # yum -y localinstall jdk-${JDK_VERSION}_linux-x64_bin.rpm \
      # && rm -f jdk-${JDK_VERSION}_linux-x64_bin.rpm
    
      # echo "Configuring Oracle java as default"
      # # uncomment next line to see the java alternatives
      # # update-alternatives --config java
      # alternatives --set java /usr/java/jre-${JRE_VERSION}/bin/java
      # # alternatives --set java /usr/java/jdk-${JDK_VERSION}/bin/java
    
      # # formerly downloaded java on the fly
      # # wget --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jre-8u131-linux-x64.rpm \
      # # && rpm -Uvh jre-8u131-linux-x64.rpm \
      # # && rm -f jre-8u131-linux-x64.rpm
    
    
      ############
      # CROMWELL #
      ############
    
      CROMWELL_VERSION=31
      CROMWELL_SLURM_CONF_VERSION=2018-04-10
    
      echo "Installing Cromwell ${CROMWELL_VERSION} in ${CROMWELL_APP_DIR}"
      mv ${TMP_DIR}/cromwell-${CROMWELL_VERSION}.jar ${CROMWELL_APP_DIR}/cromwell.jar
    
      echo "Installing cromwell_slurm_cmco-sys-dev-web.conf ${CROMWELL_SLURM_CONF_VERSION} in ${CROMWELL_APP_DIR}"
      mv ${TMP_DIR}/cromwell_slurm_cmco-sys-dev-web_${CROMWELL_SLURM_CONF_VERSION}.conf ${CROMWELL_APP_DIR}/cromwell_slurm_cmco-sys-dev-web.conf
    
      echo "Installing womtool ${CROMWELL_VERSION} in ${CROMWELL_APP_DIR}"
      mv ${TMP_DIR}/womtool-${CROMWELL_VERSION}.jar ${CROMWELL_APP_DIR}/womtool.jar
    
    
      ###########
      # CLEANUP #
      ###########
    
      echo "Cleaning out ${TMP_DIR}"
      rm -rf ${TMP_DIR}
    
      echo "Removing unneeded rpms"
      yum clean all
      rm -rf /var/cache/yum
    
    
    %runscript
      exec /usr/bin/java -DLOG_LEVEL=DEBUG -Dconfig.file=/opt/cromwell/cromwell_slurm_cmco-sys-dev-web.conf -jar /opt/cromwell/cromwell.jar server
    
    
    %apprun womtool
      exec /usr/bin/java -jar /opt/cromwell/womtool.jar "[email protected]"
    
  • myourshawmyourshaw University of ColoradoMember

    We then run cromwell as a systemd service on a VM that can see our compute cluster.

    Feel free to contact me if you want to know more about our configuration

    This is our systemd file:

    # /etc/systemd/system/cromwell.service
    [Unit]
    Description=cromwell service
    
    [Install]
    WantedBy=multi-user.target
    
    [Service]
    
    User=cmoco_sys_dev
    Group=ticr_cmoco_sys_dev
    
    WorkingDirectory=/home/cmoco
    
    # Execute pre and post scripts as root, otherwise it does it as User=
    PermissionsStartOnly=True
    
    # RuntimeDirectory will be /run/cromwell
    RuntimeDirectory=cromwell
    
    # TODO: temporary use of 2.4.2.2 bug fix
    ExecStart=/opt/singularity/singularity2.4.2.2/bin/singularity run -B /gpfs:/gpfs /home/cmoco/cromwell/singularity/Cromwell.simg
    
    SuccessExitStatus=143
    TimeoutStopSec=10
    # Restart crashed server only; 'on-failure' would also restart
    Restart=on-abort
    RestartSec=5
    
    PrivateTmp=true
    
  • myourshawmyourshaw University of ColoradoMember

    And this cromwell config file goes in the singularity image. The cromwell mysql database is hosted on another VM, running MariaDB. And our compute cluster uses Slurm.

    #############################################################
    #  Cromwell SLURM cmoco-sys-dev-web Config File 2018-04-06  #
    #############################################################
    
    # http://cromwell.readthedocs.io/en/develop/tutorials/ConfigurationFiles/
    # http://cromwell.readthedocs.io/en/develop/Configuring/
    # https://github.com/broadinstitute/cromwell/blob/develop/cromwell.examples.conf
    # http://cromwell.readthedocs.io/en/develop/tutorials/LocalBackendIntro/ [under construction]
    
    
    # This line is required. It pulls in default overrides from the embedded cromwell `application.conf` needed for proper
    # performance of cromwell.
    include required(classpath("application"))
    
    # When uncommented, the following override default settings in application.conf
    
    # When run within singularity, /gpfs/share/cmoco_sys_dev/nfs/storage/cromwell in the external file
    # system will be bound to /gpfs/share/cmoco_sys_dev/nfs/storage/cromwell in the singularity image
    # Thus, the following settings should be rooted at /gpfs/share/cmoco_sys_dev/nfs/storage/cromwell:
    #    workflow-options.workflow-log-dir (in singularity image)
    #    backend.providers.Local.config.root (on HPC)
    #    backend.providers.SLURM.config.root (on HPC)
    
    # Cromwell HTTP server settings
    webservice {
      port = 8000
      interface = 0.0.0.0
      #binding-timeout = 5s
      instance.name = "reference"
    }  # webservice
    
    akka {
      # Optionally set / override any akka settings
    }   #akka
    
    # Cromwell "system" settings
    system {
      # If 'true', a SIGINT will trigger Cromwell to attempt to abort all currently running jobs before exiting
      abort-jobs-on-terminate = false
      #abort-jobs-on-terminate = true # allows for better recovery?
    
      # If 'true', a SIGTERM or SIGINT will trigger Cromwell to attempt to gracefully shutdown in server mode,
      # in particular clearing up all queued database writes before letting the JVM shut down.
      # The shutdown is a multi-phase process, each phase having its own configurable timeout. See the Dev Wiki for more details.
      graceful-server-shutdown = true
    
      # If 'true' then when Cromwell starts up, it tries to restart incomplete workflows (turned off during testing)
      #workflow-restart = true
      workflow-restart = false
    
      # Cromwell will cap the number of running workflows at N
      # default 5000
      #max-concurrent-workflows = 5000
    
      # Cromwell will launch up to N submitted workflows at a time, regardless of how many open workflow slots exist
      # default 50
      #max-workflow-launch-count = 50
    
      # Number of seconds between workflow launches
      # default 2
      #new-workflow-poll-rate = 20
    
      # Since the WorkflowLogCopyRouter is initialized in code, this is the number of workers
      #number-of-workflow-log-copy-workers = 10
    
      # Default number of cache read workers
      #number-of-cache-read-workers = 25
    
      io {
        # Global Throttling - This is mostly useful for GCS and can be adjusted to match
        # the quota availble on the GCS API
        #number-of-requests = 100000
        #per = 100 seconds
    
        # Number of times an I/O operation should be attempted before giving up and failing it.
        #number-of-attempts = 5
      }  # io
    
      # Maximum number of input file bytes allowed in order to read each type.
      # If exceeded a FileSizeTooBig exception will be thrown.
      input-read-limits {
    
        #lines = 128000
    
        #bool = 7
    
        #int = 19
    
        #float = 50
    
        #string = 128000
    
        #json = 128000
    
        #tsv = 128000
    
        #map = 128000
    
        #object = 128000
      }
    }  # system
    
    workflow-options {
      # These workflow options will be encrypted when stored in the database
      #encrypted-fields: []
    
      # AES-256 key to use to encrypt the values in `encrypted-fields`
      #base64-encryption-key: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
    
      # Directory where to write per workflow logs
      #workflow-log-dir: "cromwell-workflow-logs"
      workflow-log-dir: "/gpfs/share/cmoco_sys_dev/nfs/storage/cromwell/cromwell-workflow-logs"
    
      # When true, per workflow logs will be deleted after copying
      #workflow-log-temporary: true
    
      # Workflow-failure-mode determines what happens to other calls when a call fails. Can be either ContinueWhilePossible or NoNewCalls.
      # Can also be overridden in workflow options. Defaults to NoNewCalls. Uncomment to change:
      # workflow-failure-mode: "ContinueWhilePossible"
      workflow-failure-mode: "NoNewCalls" # allows for better recovery?
    
      default {
        # When a workflow type is not provided on workflow submission, this specifies the default type.
        #workflow-type: WDL
    
        # When a workflow type version is not provided on workflow submission, this specifies the default type version.
        #workflow-type-version: "draft-2"
      }
    }  # workflow-options
    
    # Optional call-caching configuration.
    call-caching {
      # Allows re-use of existing results for jobs you've already run
      # (default: false)
      #enabled = false
    
      # Whether to invalidate a cache result forever if we cannot reuse them. Disable this if you expect some cache copies
      # to fail for external reasons which should not invalidate the cache (e.g. auth differences between users):
      # (default: true)
      #invalidate-bad-cache-results = true
    }  # call-caching
    
    # Google configuration
    google {
    
      #application-name = "cromwell"
    
      # Default: just application default
      #auths = [
    
        # Application default
        #{
        #  name = "application-default"
        #  scheme = "application_default"
        #},
    
        # Use a refresh token
        #{
        #  name = "user-via-refresh"
        #  scheme = "refresh_token"
        #  client-id = "secret_id"
        #  client-secret = "secret_secret"
        #},
    
    
        # Use a static service account
        #{
        #  name = "service-account"
        #  scheme = "service_account"
        #  Choose between PEM file and JSON file as a credential format. They're mutually exclusive.
        #  PEM format:
        #  service-account-id = "my-service-account"
        #  pem-file = "/path/to/file.pem"
        #  JSON format:
        #  json-file = "/path/to/file.json"
        #}
    
        # Use service accounts provided through workflow options
        #{
        #   name = "user-service-account"
        #   scheme = "user_service_account"
        #}
      #]
    }  # google
    
    docker {
      hash-lookup {
        # Set this to match your available quota against the Google Container Engine API
        #gcr-api-queries-per-100-seconds = 1000
    
        # Time in minutes before an entry expires from the docker hashes cache and needs to be fetched again
        #cache-entry-ttl = "20 minutes"
    
        # Maximum number of elements to be kept in the cache. If the limit is reached, old elements will be removed from the cache
        #cache-size = 200
    
        # How should docker hashes be looked up. Possible values are "local" and "remote"
        # "local": Lookup hashes on the local docker daemon using the cli
        # "remote": Lookup hashes on docker hub and gcr
        #method = "remote"
      }
    }  # docker
    
    engine {
      # This instructs the engine which filesystems are at its disposal to perform any IO operation that it might need.
      # For instance, WDL variables declared at the Workflow level will be evaluated using the filesystems declared here.
      # If you intend to be able to run workflows with this kind of declarations:
      # workflow {
      #    String str = read_string("gs://bucket/my-file.txt")
      # }
      # You will need to provide the engine with a gcs filesystem
      # Note that the default filesystem (local) is always available.
      filesystems {
        #  gcs {
        #    auth = "application-default"
        #  }
        #  oss {
        #    auth {
        #      endpoint = ""
        #      access-id = ""
        #      access-key = ""
        #      security-token = ""
        #    }
        #  }
        local {
          #enabled: true
        }
      }
    }  # engine
    
    backend {
      # Override the default backend.
      # default = "Local"
      default = "SLURM"
    
      # The list of providers.  
      providers {
        # The local provider is included by default in the reference.conf. This is an example.
    
        # Define a new backend provider.
    
        Local {
          # The actor that runs the backend. In this case, it's the Shared File System (SFS) ConfigBackend.
          actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
    
          # The backend custom configuration.
          config {
    
            # Optional limits on the number of concurrent jobs
            #concurrent-job-limit = 5
    
            # If true submits scripts to the bash background using "&". Only usefull for dispatchers that do NOT submit
            # the job and then immediately return a scheduled job id.
            run-in-background = true
    
            # `temporary-directory` creates the temporary directory for commands.
            #
            # If this value is not set explicitly, the default value creates a unique temporary directory, equivalent to:
            # temporary-directory = "$(mktemp -d \"$PWD\"/tmp.XXXXXX)"
            #
            # The expression is run from the execution directory for the script. The expression must create the directory
            # if it does not exist, and then return the full path to the directory.
            #
            # To create and return a non-random temporary directory, use something like:
            # temporary-directory = "$(mkdir -p /tmp/mydir && echo /tmp/mydir)"
    
            # `script-epilogue` configures a shell command to run after the execution of every command block.
            #
            # If this value is not set explicitly, the default value is `sync`, equivalent to:
            # script-epilogue = "sync"
            #
            # To turn off the default `sync` behavior set this value to an empty string:
            # script-epilogue = ""
    
            # The list of possible runtime custom attributes.
            runtime-attributes = """
            String? docker
            String? docker_user
            """
    
            # Submit string when there is no "docker" runtime attribute.
            submit = "/bin/bash ${script}"
    
            # Submit string when there is a "docker" runtime attribute.
            submit-docker = """
            docker run \
              --rm -i \
              ${"--user " + docker_user} \
              --entrypoint /bin/bash \
              -v ${cwd}:${docker_cwd} \
              ${docker} ${script}
            """
    
            # Root directory where Cromwell writes job results.  This directory must be
            # visible and writeable by the Cromwell process as well as the jobs that Cromwell
            # launches.
            #root = "cromwell-executions"
            root: "/gpfs/share/cmoco_sys_dev/nfs/storage/cromwell/cromwell-executions"
    
            # File system configuration.
            filesystems {
    
              # For SFS backends, the "local" configuration specifies how files are handled.
              local {
    
                # Try to hard link (ln), then soft-link (ln -s), and if both fail, then copy the files.
                localization: [
                  "hard-link", "soft-link", "copy"
                ]
    
                # Call caching strategies
                caching {
                  # When copying a cached result, what type of file duplication should occur. Attempted in the order listed below:
                  duplication-strategy: [
                    "hard-link", "soft-link", "copy"
                  ]
    
                  # Possible values: file, path
                  # "file" will compute an md5 hash of the file content.
                  # "path" will compute an md5 hash of the file path. This strategy will only be effective if the duplication-strategy (above) is set to "soft-link",
                  # in order to allow for the original file path to be hashed.
                  hashing-strategy: "file"
    
                  # When true, will check if a sibling file with the same name and the .md5 extension exists, and if it does, use the content of this file as a hash.
                  # If false or the md5 does not exist, will proceed with the above-defined hashing strategy.
                  check-sibling-md5: false
                }
              }  # local
            }  # filesystems
    
            # The defaults for runtime attributes if not provided.
            default-runtime-attributes {
              failOnStderr: false
              continueOnReturnCode: 0
            }  # default-runtime-attributes
          }  # config
        }  # Local
    
        # Other backend examples. Uncomment the block/stanza for your configuration. May need more tweaking/configuration
        # for your specific environment.
    
        #TES {
        #  actor-factory = "cromwell.backend.impl.tes.TesBackendLifecycleActorFactory"
        #  config {
        #    root = "cromwell-executions"
        #    dockerRoot = "/cromwell-executions"
        #    endpoint = "http://127.0.0.1:9000/v1/jobs"
        #    default-runtime-attributes {
        #      cpu: 1
        #      failOnStderr: false
        #      continueOnReturnCode: 0
        #      memory: "2 GB"
        #      disk: "2 GB"
        #    }
        #  }
        #}
    
        #BCS {
        #  actor-factory = "cromwell.backend.impl.bcs.BcsBackendLifecycleActorFactory"
        #  config {
        #    root = "oss://your-bucket/cromwell-exe"
        #    dockerRoot = "/cromwell-executions"
        #    region = ""
    
        #    #access-id = ""
        #    #access-key = ""
      # #security-token = ""
    
        #    filesystems {
        #      oss {
        #        auth {
        #            #endpoint = ""
        #            #access-id = ""
        #            #access-key = ""
      #         #security-token = ""
        #        }
        #      }
        #    }
    
        #    default-runtime-attributes {
        #        #failOnStderr: false
        #        #continueOnReturnCode: 0
        #        #cluster: "cls-mycluster"
        #        #mounts: "oss://bcs-bucket/bcs-dir/ /home/inputs/ false"
        #        #docker: "ubuntu/latest oss://bcs-reg/ubuntu/"
        #        #userData: "key value"
        #        #reserveOnFail: true
        #        #autoReleaseJob: true
        #        #verbose: false
        #        #workerPath: "oss://bcs-bucket/workflow/worker.tar.gz"
        #        #systemDisk: "cloud 50"
        #        #dataDisk: "cloud 250 /home/data/"
        #        #timeout: 3000
        #        #vpc: "192.168.0.0/16 vpc-xxxx"
        #    }
        #  }
        #}
    
        #SGE {
        #  actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
        #  config {
        #
        #    # Limits the number of concurrent jobs
        #    concurrent-job-limit = 5
        #
        #    runtime-attributes = """
        #    Int cpu = 1
        #    Float? memory_gb
        #    String? sge_queue
        #    String? sge_project
        #    """
        #
        #    submit = """
        #    qsub \
        #    -terse \
        #    -V \
        #    -b n \
        #    -N ${job_name} \
        #    -wd ${cwd} \
        #    -o ${out} \
        #    -e ${err} \
        #    -pe smp ${cpu} \
        #    ${"-l mem_free=" + memory_gb + "g"} \
        #    ${"-q " + sge_queue} \
        #    ${"-P " + sge_project} \
        #    ${script}
        #    """
        #
        #    kill = "qdel ${job_id}"
        #    check-alive = "qstat -j ${job_id}"
        #    job-id-regex = "(\\d+)"
        #  }
        #}
    
        #LSF {
        #  actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
        #  config {
        #    submit = "bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} /bin/bash ${script}"
        #    kill = "bkill ${job_id}"
        #    check-alive = "bjobs ${job_id}"
        #    job-id-regex = "Job <(\\d+)>.*"
        #  }
        #}
    
        SLURM {
            actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
            config {
              root: "/gpfs/share/cmoco_sys_dev/nfs/storage/cromwell/cromwell-executions"
              #run-in-background = true
              run-in-background = false # I *think* this is what we need to set to get it to properly get SLURM job ids using job-id-regex
    
              runtime-attributes = """
              Int? cpu
              Int? memory_mb = 10000
              String? time = "06:00:00"
              String slurm_partition = "defq"
              String? comment
              String hpc_login_host = "cubipmlgn01.ucdenver.pvt"
              # String? account = "61001226"
              """
    
              submit = """
              ssh cubipmlgn01.ucdenver.pvt \
              '/bin/bash --login -c "sbatch \
              --export=ALL \
              --job-name=${job_name} \
              --workdir=${cwd} \
              --output=${out} \
              --error=${err} \
              --ntasks=1 \
              --partition=${slurm_partition} \
              ${"--cpus-per-task=" + cpu} \
              ${"--mem-per-cpu=" + memory_mb} \
              ${"--comment=" + comment} \
              ${"--time=" + time} \
              --wrap \"/bin/bash ${script}\""'
              """
    
              kill = """
              ssh cubipmlgn01.ucdenver.pvt \
              '/bin/bash --login -c "scancel ${job_id}"' \
              """
    
              check-alive = """
              ssh cubipmlgn01.ucdenver.pvt \
              '/bin/bash --login -c "squeue --jobs ${job_id}"'  # I *think* this is what we need to check job status...
              """
    
              job-id-regex = "Submitted batch job (\\d+).*"  # with run-in-background = false, this *should* look in stdout.submit to get job id
    
              filesystems {
                local {
                  localization: [
                  "hard-link", "soft-link", "copy"
                  ]
    
                  caching {
                    # When copying a cached result, what type of file duplication should occur. Attempted in the order listed below:
                    duplication-strategy: [
                      "hard-link", "soft-link", "copy"
                    ]
    
                    # Possible values: file, path
                    # "file" will compute an md5 hash of the file content.
                    # "path" will compute an md5 hash of the file path. This strategy will only be effective if the duplication-strategy (above) is set to "soft-link",
                    # in order to allow for the original file path to be hashed.
                    hashing-strategy: "file"
    
                    # When true, will check if a sibling file with the same name and the .md5 extension exists, and if it does, use the content of this file as a hash.
                    # If false or the md5 does not exist, will proceed with the above-defined hashing strategy.
                    check-sibling-md5: false
                  }
                }
              }  # filesystems
            }  # config
        }  # SLURM
    
        # Example backend that _only_ runs workflows that specify docker for every command.
        #Docker {
        #  actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
        #  config {
        #    run-in-background = true
        #    runtime-attributes = "String docker"
        #    submit-docker = "docker run --rm -v ${cwd}:${docker_cwd} -i ${docker} /bin/bash < ${script}"
        #  }
        #}
    
        #HtCondor {
        #  actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
        #  config {
        #    runtime-attributes = """
        #      Int cpu = 1
        #      Float memory_mb = 512.0
        #      Float disk_kb = 256000.0
        #      String? nativeSpecs
        #      String? docker
        #    """
        #
        #    submit = """
        #      chmod 755 ${script}
        #      cat > ${cwd}/execution/submitFile <<EOF
        #      Iwd=${cwd}/execution
        #      requirements=${nativeSpecs}
        #      leave_in_queue=true
        #      request_memory=${memory_mb}
        #      request_disk=${disk_kb}
        #      error=${err}
        #      output=${out}
        #      log_xml=true
        #      request_cpus=${cpu}
        #      executable=${script}
        #      log=${cwd}/execution/execution.log
        #      queue
        #      EOF
        #      condor_submit ${cwd}/execution/submitFile
        #    """
        #
        #    submit-docker = """
        #      chmod 755 ${script}
        #      cat > ${cwd}/execution/dockerScript <<EOF
        #      #!/bin/bash
        #      docker run --rm -i -v ${cwd}:${docker_cwd} ${docker} /bin/bash ${script}
        #      EOF
        #      chmod 755 ${cwd}/execution/dockerScript
        #      cat > ${cwd}/execution/submitFile <<EOF
        #      Iwd=${cwd}/execution
        #      requirements=${nativeSpecs}
        #      leave_in_queue=true
        #      request_memory=${memory_mb}
        #      request_disk=${disk_kb}
        #      error=${cwd}/execution/stderr
        #      output=${cwd}/execution/stdout
        #      log_xml=true
        #      request_cpus=${cpu}
        #      executable=${cwd}/execution/dockerScript
        #      log=${cwd}/execution/execution.log
        #      queue
        #      EOF
        #      condor_submit ${cwd}/execution/submitFile
        #    """
        #
        #    kill = "condor_rm ${job_id}"
        #    check-alive = "condor_q ${job_id}"
        #    job-id-regex = "(?sm).*cluster (\\d+)..*"
        #  }
        #}
    
        #Spark {
        # actor-factory = "cromwell.backend.impl.spark.SparkBackendFactory"
        # config {
        #   # Root directory where Cromwell writes job results.  This directory must be
        #    # visible and writeable by the Cromwell process as well as the jobs that Cromwell
        #   # launches.
        #   root: "cromwell-executions"
        #
        #   filesystems {
        #     local {
        #       localization: [
        #         "hard-link", "soft-link", "copy"
        #       ]
        #     }
        #    }
        #      # change (master, deployMode) to (yarn, client), (yarn, cluster)
        #      #  or (spark://hostname:port, cluster) for spark standalone cluster mode
        #     master: "local"
        #     deployMode: "client"
        #  }
        # }
    
        #JES {
        #  actor-factory = "cromwell.backend.impl.jes.JesBackendLifecycleActorFactory"
        #  config {
        #    # Google project
        #    project = "my-cromwell-workflows"
        #
        #    # Base bucket for workflow executions
        #    root = "gs://my-cromwell-workflows-bucket"
        #
        #    # Set this to the lower of the two values "Queries per 100 seconds" and "Queries per 100 seconds per user" for
        #    # your project.
        #    #
        #    # Used to help determine maximum throughput to the Google Genomics API. Setting this value too low will
        #    # cause a drop in performance. Setting this value too high will cause QPS based locks from Google.
        #    # 1000 is the default "Queries per 100 seconds per user", 50000 is the default "Queries per 100 seconds"
        #    # See https://cloud.google.com/genomics/quotas for more information
        #    genomics-api-queries-per-100-seconds = 1000
        #
        #    # Polling for completion backs-off gradually for slower-running jobs.
        #    # This is the maximum polling interval (in seconds):
        #    maximum-polling-interval = 600
        #
        #    # Optional Dockerhub Credentials. Can be used to access private docker images.
        #    dockerhub {
        #      # account = ""
        #      # token = ""
        #    }
        #
        #    genomics {
        #      # A reference to an auth defined in the `google` stanza at the top.  This auth is used to create
        #      # Pipelines and manipulate auth JSONs.
        #      auth = "application-default"
        #
        #
        #      // alternative service account to use on the launched compute instance
        #      // NOTE: If combined with service account authorization, both that serivce account and this service account
        #      // must be able to read and write to the 'root' GCS path
        #      compute-service-account = "default"
        #
        #      # Endpoint for APIs, no reason to change this unless directed by Google.
        #      endpoint-url = "https://genomics.googleapis.com/"
        #
        #      # Restrict access to VM metadata. Useful in cases when untrusted containers are running under a service
        #      # account not owned by the submitting user
        #      restrict-metadata-access = false
        #    }
        #
        #    filesystems {
        #      gcs {
        #        # A reference to a potentially different auth for manipulating files via engine functions.
        #        auth = "application-default"
        #
        #        caching {
        #          # When a cache hit is found, the following duplication strategy will be followed to use the cached outputs
        #          # Possible values: "copy", "reference". Defaults to "copy"
        #          # "copy": Copy the output files
        #          # "reference": DO NOT copy the output files but point to the original output files instead.
        #          #              Will still make sure than all the original output files exist and are accessible before
        #          #              going forward with the cache hit.
        #          duplication-strategy = "copy"
        #        }
        #      }
        #    }
        #
        #    default-runtime-attributes {
        #      cpu: 1
        #      failOnStderr: false
        #      continueOnReturnCode: 0
        #      memory: "2 GB"
        #      bootDiskSizeGb: 10
        #      # Allowed to be a String, or a list of Strings
        #      disks: "local-disk 10 SSD"
        #      noAddress: false
        #      preemptible: 0
        #      zones: ["us-central1-a", "us-central1-b"]
        #    }
        #  }
        #}
    
        #AWS {
        #  actor-factory = "cromwell.backend.impl.aws.AwsBackendActorFactory"
        #  config {
        #    ## These two settings are required to authenticate with the ECS service:
        #    accessKeyId = "..."
        #    secretKey = "..."
        #  }
        #}
      }  # providers
    }  # backend
    
    services {
      MetadataService {
        config {
          # Set this value to "Inf" to turn off metadata summary refresh.  The default value is currently "2 seconds".
          # metadata-summary-refresh-interval = "Inf"
          # For higher scale environments, e.g. many workflows and/or jobs, DB write performance for metadata events
          # can improved by writing to the database in batches. Increasing this value can dramatically improve overall
          # performance but will both lead to a higher memory usage as well as increase the risk that metadata events
          # might not have been persisted in the event of a Cromwell crash.
          #
          # For normal usage the default value of 200 should be fine but for larger/production environments we recommend a
          # value of at least 500. There'll be no one size fits all number here so we recommend benchmarking performance and
          # tuning the value to match your environment.
          # db-batch-size = 200
          #
          # Periodically the stored metadata events will be forcibly written to the DB regardless of if the batch size
          # has been reached. This is to prevent situations where events wind up never being written to an incomplete batch
          # with no new events being generated. The default value is currently 5 seconds
          # db-flush-rate = 5 seconds
        }
      }
    
      Instrumentation {
        # StatsD - Send metrics to a StatsD server over UDP
        # class = "cromwell.services.instrumentation.impl.statsd.StatsDInstrumentationServiceActor"
        # config.statsd {
        #   hostname = "localhost"
        #   port = 8125
        #   prefix = "" # can be used to prefix all metrics with an api key for example
        #   flush-rate = 1 second # rate at which aggregated metrics will be sent to statsd
        # }
      }
    
      HealthMonitor {
        config {
          # How long to wait between status check sweeps
          # check-refresh-time = 5 minutes
          # For any given status check, how long to wait before assuming failure
          # check-timeout = 1 minute
          # For any given status datum, the maximum time a value will be kept before reverting back to "Unknown"
          # status-ttl = 15 minutes
    
          ## When using the WorkbenchHealthMonitorServiceActor, the following are possibilities
    
          # This *MUST* be set to the name of the PAPI (aka JES) backend one defined in the Backends stanza. Most likely
          # it is "Jes" or "JES"
          # papi-backend-name = JES
    
          # The name of an authentication scheme to use for e.g. pinging PAPI and GCS. This should be either an application
          # default or service account auth, otherwise things won't work as there'll not be a refresh token where you need
          # them.
          # google-auth-name = application-default
    
          # A bucket in GCS to periodically stat to check for connectivity. This must be accessible by the auth mode
          # specified by google-auth-name
          # gcs-bucket-to-check = some-bucket-name
        }
      }
    }  # services
    
    database {
      # mysql
      # see all possible parameters and default values here:
      # http://slick.lightbend.com/doc/3.2.0/api/index.html#[email protected](String,Config,Driver):Database
      profile = "slick.jdbc.MySQLProfile$"  # new format as of cromwell 27
      db {
        driver = "com.mysql.jdbc.Driver"
        url = "jdbc:mysql://cmoco-sys-dev-db.ucdenver.pvt/cromwell"
        #url = "jdbc:mysql://cmoco-sys-dev-db.ucdenver.pvt/cromwell?verifyServerCertificate=false&useSSL=true&requireSSL=true"
        user = "cromwell"
        password = "<insert password here>"
        connectionTimeout = 5000
      }  # db
    
      # For batch inserts the number of inserts to send to the DB at a time
      # insert-batch-size = 2000
    
      migration {
        # For databases with a very large number of symbols, selecting all the rows at once can generate a variety of
        # problems. In order to avoid any issue, the selection is paginated. This value sets how many rows should be
        # retrieved and processed at a time, before asking for the next chunk.
        #read-batch-size = 100000
    
        # Because a symbol row can contain any arbitrary wdl value, the amount of metadata rows to insert from a single
        # symbol row can vary from 1 to several thousands (or more). To keep the size of the insert batch from growing out
        # of control we monitor its size and execute/commit when it reaches or exceeds writeBatchSize.
        #write-batch-size = 100000
      }  # migration
    
    
      # To customize the metadata database connection, create a block under `database` with the metadata database settings.
      #
      # For example, the default database stores all data in memory. This commented out block would store `metadata` in an
      # hsqldb file, without modifying the internal engine database connection.
      #
      # The value `${uniqueSchema}` is always replaced with a unqiue UUID on each cromwell startup.
      #
      # This feature should be considered experimental and likely to change in the future.
    
      #metadata {
      #  profile = "slick.jdbc.HsqldbProfile$"
      #  db {
      #    driver = "org.hsqldb.jdbcDriver"
      #    url = "jdbc:hsqldb:file:metadata-${uniqueSchema};shutdown=false;hsqldb.tx=mvcc"
      #    connectionTimeout = 3000
      #  }
      #}
    }  # database
    
  • myourshawmyourshaw University of ColoradoMember

    Finally, we have a different singularity container for actually running gatk jobs on the cluster. That image contains java, gatk, samtools, etc., and defines the singularity run command as

    %runscript
      exec /opt/gatk "[email protected]"
    

    In inputs.json we define gatk_cmd as

    'singularity run -B  /mnt/hdd/germline/applications/singularity/GATK4.simg'
    

    A typical wdl task looks like this:

    task CheckIlluminaDirectory {
      String gatk_cmd
      String basecalls_dir
      Array[Int] lanes
      String read_structure
    
      command {
        ${gatk_cmd} --java-options "-Xmx1g" \
          CheckIlluminaDirectory \
            --BASECALLS_DIR ${basecalls_dir} \
            --LANES ${sep=' --LANES ' lanes} \
            --READ_STRUCTURE ${read_structure}
      }
      runtime {
        memory: "1G"
        cpu: 1
      }
      output {
        # rc=1 if anything fails
        File check_illumina_directory = stderr()
      }
    }
    
  • RuchiRuchi Member, Broadie, Moderator, Dev

    Hey @abdulrauf ,

    Firstly, I want to thank you for providing such a detailed description of your project, it was very helpful to get some context. We've not done new work to support Singularity in Cromwell so far, but it seems like the the solution presented here (thanks to @myourshaw) may be interesting to you. If I'm understanding the configs above properly, the solution provided involves running Cromwell from inside a Singularity image-- and this Cromwell is dispatching jobs to a SLURM cluster, and these jobs are running inside a different Singularity image? If this works for you, great! If not, I'd like to learn more about your requirements, so feel free to reach out directly ([email protected]). Thanks!

  • abdulraufabdulrauf Member

    Hi @Ruchi

    Looking at config provided by @myourshaw

    I think their cromwell instance is running inside singularity image but it's still expecting target HPC compute SLURM with docker capability.

    If you see his last comment around code line # 276

    # The list of possible runtime custom attributes.
            runtime-attributes = """
            String? docker
            String? docker_user
            """
    
            # Submit string when there is no "docker" runtime attribute.
            submit = "/bin/bash ${script}"
    
            # Submit string when there is a "docker" runtime attribute.
            submit-docker = """
            docker run \
              --rm -i \
              ${"--user " + docker_user} \
              --entrypoint /bin/bash \
              -v ${cwd}:${docker_cwd} \
              ${docker} ${script}
            """
    

    But in our case we don't docker support on SLURM based HPC. Instead HPC team only supports Singularity, they are happy to support any other container engine which does not require root access like udocker or shifter.

    I need your help how to run workflow like this one https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline/blob/master/tasks_pipelines/bam_processing.wdl#L40 run in our scenario.

  • myourshawmyourshaw University of ColoradoMember

    Hi @abdulrauf

    To clarify, we do not use docker on the HPC, for the same security reasons that concern your group. The references to docker in the config are relics of the Broad example we used as a starting point. But in our case, jobs running on the HPC use Singularity.

Sign In or Register to comment.