Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Running Cromwell on AWS Batch, using both S3 and GC filesystems

MigwellMigwell Member
edited October 2018 in Ask the Cromwell + WDL Team

I have a use-case where I'm running Cromwell on AWS. I've followed the instructions on the AWS for Genomics Workflows page, and everything has worked fine. However, I need my Cromwell server to accept files from either S3 (AWS) or GC (Google Cloud) URLs.

To this end, I updated my configuration file to add a gc section, to this:

// aws.conf
include required(classpath("application"))

aws {
  application-name = "cromwell"
  auths = [{
      name = "default"
      scheme = "default"
  }]
  region = "default"
}

engine {
  filesystems {
    s3 { auth = "default" }
  }
  gcs {
    auth = "application-default"
  }
}

backend {
  default = "AWSBATCH"
  providers {
    AWSBATCH {
      actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
      config {
        root = "my_s3_url"
        auth = "default"

        numSubmitAttempts = 3
        numCreateDefinitionAttempts = 3

        concurrent-job-limit = 16

        default-runtime-attributes {
          queueArn: "arn:aws:batch:my_batch_arn"
        }

        filesystems {
          s3 {
            auth = "default"
          }
      gcs {
        auth = "application-default"
      }
        }
      }
    }
  }
}

However, if I attempt to submit a job that uses gc urls, I get the following error:

2018-10-15 07:25:06,383 cromwell-system-akka.dispatchers.engine-dispatcher-20 ERROR - WorkflowManagerActor Workflow fdbc7c62-705a-4bd7-9bf5-3185bc6b1b02 failed (during ExecutingWorkflowState): java.lang.RuntimeException: Failed to evaluate 'PreProcessingForVariantDiscovery_GATK4.flowcell_unmapped_bams' (reason 1 of 1): Evaluating read_lines(flowcell_unmapped_bams_list) failed: java.lang.IllegalArgumentException: Either gs://gatk-test-data/wgs_ubam/NA12878_24RG/NA12878_24RG_small.txt exists on a filesystem not supported by this instance of Cromwell, or a failure occurred while building an actionable path from it. Supported filesystems are: s3, LinuxFileSystem. Failures: s3: S3 URIs must have 's3' scheme: gs://gatk-test-data/wgs_ubam/NA12878_24RG/NA12878_24RG_small.txt (IllegalArgumentException)
LinuxFileSystem: Cannot build a local path from gs://gatk-test-data/wgs_ubam/NA12878_24RG/NA12878_24RG_small.txt (RuntimeException) Please refer to the documentation for more information on how to configure filesystems: cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
        at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.processRunnable(ExpressionKey.scala:29)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$startRunnableNodes$7(WorkflowExecutionActor.scala:511)
        at cats.instances.ListInstances$$anon$1.$anonfun$traverse$2(list.scala:73)
        at cats.instances.ListInstances$$anon$1.loop$2(list.scala:63)
        at cats.instances.ListInstances$$anon$1.$anonfun$foldRight$1(list.scala:63)
        at cats.Eval$.loop$1(Eval.scala:338)
        at cats.Eval$.cats$Eval$$evaluate(Eval.scala:372)
        at cats.Eval$Defer.value(Eval.scala:258)
        at cats.instances.ListInstances$$anon$1.traverse(list.scala:72)
        at cats.instances.ListInstances$$anon$1.traverse(list.scala:12)
        at cats.Traverse$Ops.traverse(Traverse.scala:19)
        at cats.Traverse$Ops.traverse$(Traverse.scala:19)
        at cats.Traverse$ToTraverseOps$$anon$3.traverse(Traverse.scala:19)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.cromwell$engine$workflow$lifecycle$execution$WorkflowExecutionActor$$startRunnableNodes(WorkflowExecutionActor.scala:505)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:187)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:185)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
        at akka.actor.FSM.processEvent(FSM.scala:687)
        at akka.actor.FSM.processEvent$(FSM.scala:681)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$LoggingFSM$$super$processEvent(WorkflowExecutionActor.scala:49)
        at akka.actor.LoggingFSM.processEvent(FSM.scala:820)
        at akka.actor.LoggingFSM.processEvent$(FSM.scala:802)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.processEvent(WorkflowExecutionActor.scala:49)
        at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:678)
        at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:672)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$Timers$$super$aroundReceive(WorkflowExecutionActor.scala:49)
        at akka.actor.Timers.aroundReceive(Timers.scala:51)
        at akka.actor.Timers.aroundReceive$(Timers.scala:40)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.aroundReceive(WorkflowExecutionActor.scala:49)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
        at akka.actor.ActorCell.invoke(ActorCell.scala:557)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

How can I enable both filesystems?

Sign In or Register to comment.