Cromewell 28: Root Configuration Not Working ?

Hello,

I have edited my cromwell configuration to use a custom location for where the workflow executions should go. See the line:

        root: "/btl/store/dev_cromwell_executions"

In this configuration file:

##################################
# Cromwell Reference Config File #
##################################

# This is the reference config file that contains all the default settings.
# Make your edits/overrides in your application.conf.

webservice {
  port = 9000
  interface = 0.0.0.0
  instance.name = "reference"
}

akka {
  actor.default-dispatcher.fork-join-executor {
    # Number of threads = min(parallelism-factor * cpus, parallelism-max)
    # Below are the default values set by Akka, uncomment to tune these

    #parallelism-factor = 3.0
    #parallelism-max = 150
  }

  dispatchers {
    # A dispatcher for actors performing blocking io operations
    # Prevents the whole system from being slowed down when waiting for responses from external resources for instance
    io-dispatcher {
      type = Dispatcher
      executor = "fork-join-executor"
      # Using the forkjoin defaults, this can be tuned if we wish
    }

    # A dispatcher for actors handling API operations
    # Keeps the API responsive regardless of the load of workflows being run
    api-dispatcher {
      type = Dispatcher
      executor = "fork-join-executor"
    }

    # A dispatcher for engine actors
    # Because backends behaviour is unpredictable (potentially blocking, slow) the engine runs
    # on its own dispatcher to prevent backends from affecting its performance.
    engine-dispatcher {
      type = Dispatcher
      executor = "fork-join-executor"
    }

    # A dispatcher used by supported backend actors
    backend-dispatcher {
      type = Dispatcher
      executor = "fork-join-executor"
    }

    # Note that without further configuration, all other actors run on the default dispatcher
  }
}

system {
  # If 'true', a SIGINT will trigger Cromwell to attempt to abort all currently running jobs before exiting
  abort-jobs-on-terminate = false

  # Max number of retries per job that the engine will attempt in case of a retryable failure received from the backend
  max-retries = 10

  # If 'true' then when Cromwell starts up, it tries to restart incomplete workflows
  workflow-restart = true

  # Cromwell will cap the number of running workflows at N
  max-concurrent-workflows = 5000

  # Cromwell will launch up to N submitted workflows at a time, regardless of how many open workflow slots exist
  max-workflow-launch-count = 50

  # Number of seconds between workflow launches
  new-workflow-poll-rate = 20

  # Since the WorkflowLogCopyRouter is initialized in code, this is the number of workers
  number-of-workflow-log-copy-workers = 10
}

workflow-options {
  # These workflow options will be encrypted when stored in the database
  encrypted-fields: []

  # AES-256 key to use to encrypt the values in `encrypted-fields`
  base64-encryption-key: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="

  # Directory where to write per workflow logs
  workflow-log-dir: "cromwell-workflow-logs"

  # When true, per workflow logs will be deleted after copying
  workflow-log-temporary: true

  # Workflow-failure-mode determines what happens to other calls when a call fails. Can be either ContinueWhilePossible or NoNewCalls.
  # Can also be overridden in workflow options. Defaults to NoNewCalls. Uncomment to change:
  workflow-failure-mode: "ContinueWhilePossible"
}

// Optional call-caching configuration.
call-caching {
  enabled = true
  invalidate-bad-cache-results = true
}

google {

  application-name = "cromwell"

  auths = [
    {
      name = "application-default"
      scheme = "application_default"
    },
    //    {
    //      name = "user-via-refresh"
    //      scheme = "refresh_token"
    //      client-id = "secret_id"
    //      client-secret = "secret_secret"
    //    },
    //    {
    //      name = "service-account"
    //      scheme = "service_account"
    //      service-account-id = "my-service-account"
    //      pem-file = "/path/to/file.pem"
    //    }
  ]
}

engine {
  # This instructs the engine which filesystems are at its disposal to perform any IO operation that it might need.
  # For instance, WDL variables declared at the Workflow level will be evaluated using the filesystems declared here.
  # If you intend to be able to run workflows with this kind of declarations:
  # workflow {
  #    String str = read_string("gs://bucket/my-file.txt")
  # }
  # You will need to provide the engine with a gcs filesystem
  # Note that the default filesystem (local) is always available.
  #filesystems {
  #  gcs {
  #    auth = "application-default"
  #  }
  #}
}

backend {
  default = "SGE"
  max-job-retries = 4
  providers {
    Local {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        run-in-background = true
        runtime-attributes = "String? docker"
        submit = "/bin/bash ${script}"
        submit-docker = "docker run --rm -v ${cwd}:${docker_cwd} -i ${docker} /bin/bash < ${script}"
        submit-docker = "docker run --rm -v ${cwd}:${docker_cwd} -i ${docker} /bin/bash < ${script}"

        # Root directory where Cromwell writes job results.  This directory must be
        # visible and writeable by the Cromwell process as well as the jobs that Cromwell
        # launches.
        root: "/btl/store/dev_cromwell_executions"

        filesystems {
          local {
            localization: [
              "soft-link", "hard-link", "copy"
            ]
            # Call caching strategies
            caching {
              # When copying a cached result, what type of file duplication should occur. Attempted in the order listed below:
              duplication-strategy: [
                "soft-link", "hard-link", "copy"
              ]

              # Possible values: file, path
              # "file" will compute an md5 hash of the file content.
              # "path" will compute an md5 hash of the file path. This strategy will only be effective if the duplication-strategy (above) is set to "soft-link",
              # in order to allow for the original file path to be hashed.
              hashing-strategy: "path"
            }
          }
        }
      }
    }

    SGE {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        concurrent-job-limit = 5000
        runtime-attributes = """
        Int cpu = 1
        Int memory = 4
        String sge_queue = 'gaag'
        String? sge_project
        String task_name = 'cromwell_1'
        """

        submit = """
        qsub \
        -terse \
        -V \
        -b n \
        -N ${task_name} \
        -wd ${cwd} \
        -o ${out} \
        -e ${err} \
        -pe smp ${cpu} \
        ${"-l mem_free=" + memory + "G"} \
        ${"-q " + sge_queue} \
        ${"-P " + sge_project} \
        ${script}
        """

        kill = "qdel ${job_id}"
        check-alive = "qstat -j ${job_id}"
        job-id-regex = "(\\d+)"
      }
    }

    #LSF {
    #  actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
    #  config {
    #    submit = "bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} /bin/bash ${script}"
    #    kill = "bkill ${job_id}"
    #    check-alive = "bjobs ${job_id}"
    #    job-id-regex = "Job <(\\d+)>.*"
    #  }
    #}

    # Example backend that _only_ runs workflows that specify docker for every command.
    #Docker {
    #  actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
    #  config {
    #    run-in-background = true
    #    runtime-attributes = "String docker"
    #    submit-docker = "docker run --rm -v ${cwd}:${docker_cwd} -i ${docker} /bin/bash < ${script}"
    #  }
    #}

    #HtCondor {
    #  actor-factory = "cromwell.backend.impl.htcondor.HtCondorBackendFactory"
    #  config {
    #    # Root directory where Cromwell writes job results.  This directory must be
    #    # visible and writeable by the Cromwell process as well as the jobs that Cromwell
    #    # launches.
    #    root: "cromwell-executions"
    #
    #    #Placeholders:
    #    #1. Working directory.
    #    #2. Inputs volumes.
    #    #3. Output volume.
    #    #4. Docker image.
    #    #5. Job command.
    #    docker {
    #      #Allow soft links in dockerized jobs
    #      cmd = "docker run -w %s %s %s --rm %s %s"
    #    }
    #
    #    cache {
    #      provider = "cromwell.backend.impl.htcondor.caching.provider.mongodb.MongoCacheActorFactory"
    #      enabled = false
    #      forceRewrite = false
    #      db {
    #        host = "127.0.0.1"
    #        port = 27017
    #        name = "htcondor"
    #        collection = "cache"
    #      }
    #    }
    #
    #    filesystems {
    #      local {
    #        localization: [
    #          "hard-link", "soft-link", "copy"
    #        ]
    #      }
    #    }
    #    # Time (in seconds) to wait before re-checking the status of the job again
    #    poll-interval = 3
    #  }
    #}

    #Spark {
    # actor-factory = "cromwell.backend.impl.spark.SparkBackendFactory"
    # config {
    #   # Root directory where Cromwell writes job results.  This directory must be
    #    # visible and writeable by the Cromwell process as well as the jobs that Cromwell
    #   # launches.
    #   root: "cromwell-executions"
    #
    #   filesystems {
    #     local {
    #       localization: [
    #         "hard-link", "soft-link", "copy"
    #       ]
    #     }
    #    }
    #      # change (master, deployMode) to (yarn, client), (yarn, cluster)
    #      #  or (spark://hostname:port, cluster) for spark standalone cluster mode
    #     master: "local"
    #     deployMode: "client"
    #  }
    # }

    #JES {
    #  actor-factory = "cromwell.backend.impl.jes.JesBackendLifecycleActorFactory"
    #  config {
    #    # Google project
    #    project = "my-cromwell-workflows"
    #
    #    # Base bucket for workflow executions
    #    root = "gs://my-cromwell-workflows-bucket"
    #
    #    # Polling for completion backs-off gradually for slower-running jobs.
    #    # This is the maximum polling interval (in seconds):
    #    maximum-polling-interval = 600
    #
    #    # Optional Dockerhub Credentials. Can be used to access private docker images.
    #    dockerhub {
    #      # account = ""
    #      # token = ""
    #    }
    #
    #    genomics {
    #      # A reference to an auth defined in the `google` stanza at the top.  This auth is used to create
    #      # Pipelines and manipulate auth JSONs.
    #      auth = "application-default"
    #      # Endpoint for APIs, no reason to change this unless directed by Google.
    #      endpoint-url = "https://genomics.googleapis.com/"
    #    }
    #
    #    filesystems {
    #      gcs {
    #        # A reference to a potentially different auth for manipulating files via engine functions.
    #        auth = "application-default"
    #      }
    #    }
    #  }
    #}

  }
}

services {
  KeyValue {
    class = "cromwell.services.keyvalue.impl.SqlKeyValueServiceActor"
  }
  MetadataService {
    class = "cromwell.services.metadata.impl.MetadataServiceActor"
    # Set this value to "Inf" to turn off metadata summary refresh.  The default value is currently "2 seconds".
    # metadata-summary-refresh-interval = "Inf"
  }
}

database {
  # hsql default
  # driver = "slick.driver.HsqldbDriver$"
  #db {
  #  driver = "org.hsqldb.jdbcDriver"
  #  url = "jdbc:hsqldb:mem:${uniqueSchema};shutdown=false;hsqldb.tx=mvcc"
  #  connectionTimeout = 3000
  #}

  # mysql example
  #driver = "slick.driver.MySQLDriver$"
  profile = "slick.jdbc.MySQLProfile$"
  db {
    driver = "com.mysql.jdbc.Driver"
    url = "jdbc:mysql://stout/btl_cromwell_dev"
    user = "root"
    password = ""
    connectionTimeout = 10000
  }

  migration {
    # For databases with a very large number of symbols, selecting all the rows at once can generate a variety of
    # problems. In order to avoid any issue, the selection is paginated. This value sets how many rows should be
    # retrieved and processed at a time, before asking for the next chunk.
    read-batch-size = 100000

    # Because a symbol row can contain any arbitrary wdl value, the amount of metadata rows to insert from a single
    # symbol row can vary from 1 to several thousands (or more). To keep the size of the insert batch from growing out
    # of control we monitor its size and execute/commit when it reaches or exceeds writeBatchSize.
    write-batch-size = 100000
  }
}

I then launched Cromwell Server via a startup bash script as follows:

#!/bin/bash
if [ $USER != 'gaag' ]; then
        echo "User must be gaag not $USER!" 1>&2
        exit 1
else
        echo "Start GAAG Cromwell server..."
fi

source /broad/software/scripts/useuse
use Java-1.8
use GridEngine8

export CROMWELL_ROOT=`pwd`
echo $CROMWELL_ROOT
export CONSOLE_LOG=$CROMWELL_ROOT/log/console.28.log

set -m 
nohup java -Dconfig.file=$CROMWELL_ROOT/ale.reference.28.conf -jar $CROMWELL_ROOT/cromwell-28_2.jar server "[email protected]" >> $CONSOLE_LOG 2>&1 &
set +m

cromwell_pid=$!
echo $cromwell_pid > $CROMWELL_ROOT/cromwell.pid
# Wait a couple of seconds for JVM to launch
sleep 2
echo "Cromwell launched with PID $cromwell_pid"

#Wait a ittle while for it to maybe fail before checking for process existence
sleep 10

if ps -p $cromwell_pid > /dev/null
then echo Cromwell appears to be running.
else echo ERROR: Cromwell process not found. >&2
        exit 1
fi

I tried a few different workflows and they are not writing to the location specified in the config file. They instead write to the same directory that the JAR file is in, inside the cromwell_executions directory. I've checked permissions on the location specified in the file and they are 777 so should not be a problem for Cromwell to write there.

Any help appreciated.

Tagged:

Issue · Github
by Sheila

Issue Number
2586
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
knoblett

Best Answer

  • ChrisLChrisL Cambridge, MA admin
    edited October 2017 Accepted Answer

    Hey @[email protected] -

    The first thing I spotted is that the root config option is set for the Local backend but your default submission backend is SGE. Could you confirm which backend you're actually running the jobs on? (if you don't think you actively change it in your workflow options, it'll probably be the default SGE)

    I believe the root config is equally applicable to the SGE backend, so you might want to go ahead and copy it into the config section of the SGE block too.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @[email protected]

    Hi,

    I have asked the team to look into this. We will get back to you soon.

    -Sheila

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin
    edited October 2017 Accepted Answer

    Hey @[email protected] -

    The first thing I spotted is that the root config option is set for the Local backend but your default submission backend is SGE. Could you confirm which backend you're actually running the jobs on? (if you don't think you actively change it in your workflow options, it'll probably be the default SGE)

    I believe the root config is equally applicable to the SGE backend, so you might want to go ahead and copy it into the config section of the SGE block too.

  • amr@broadinstitute.orge[email protected] Member, Broadie
    edited October 2017

    Ah okay, I'm running the jobs on an SGE backend. I think I misunderstood the config file. So I think that means my cacheing strategy config is also not really applying to to the SGE backend either. I will move both to the SGE block and see if that resolves my problems.

    Also, is there a README somewhere that explains all the config options?

  • amr@broadinstitute.orge[email protected] Member, Broadie

    That was the problem @ChrisL thanks for spotting it!

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    The README doesn't cover configuration options very well but I'll ping @KateVoss to make sure it's part of our upcoming documentation overhaul effort. Thanks!

Sign In or Register to comment.