Running Cromwell on AWS Batch, using both S3 and GC filesystems

MigwellMigwell Member

I have a use-case where I'm running Cromwell on AWS. I've followed the instructions on the AWS for Genomics Workflows page, and everything has worked fine. However, I need my Cromwell server to accept files from either S3 (AWS) or GC (Google Cloud) URLs.

To this end, I updated my configuration file to add a gc section, to this:

// aws.conf
include required(classpath("application"))

aws {
  application-name = "cromwell"
  auths = [{
      name = "default"
      scheme = "default"
  }]
  region = "default"
}

engine {
  filesystems {
    s3 { auth = "default" }
  }
  gcs {
    auth = "application-default"
  }
}

backend {
  default = "AWSBATCH"
  providers {
    AWSBATCH {
      actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
      config {
        root = "my_s3_url"
        auth = "default"

        numSubmitAttempts = 3
        numCreateDefinitionAttempts = 3

        concurrent-job-limit = 16

        default-runtime-attributes {
          queueArn: "arn:aws:batch:my_batch_arn"
        }

        filesystems {
          s3 {
            auth = "default"
          }
      gcs {
        auth = "application-default"
      }
        }
      }
    }
  }
}

However, if I attempt to submit a job that uses gc urls, I get the following error:

2018-10-15 07:25:06,383 cromwell-system-akka.dispatchers.engine-dispatcher-20 ERROR - WorkflowManagerActor Workflow fdbc7c62-705a-4bd7-9bf5-3185bc6b1b02 failed (during ExecutingWorkflowState): java.lang.RuntimeException: Failed to evaluate 'PreProcessingForVariantDiscovery_GATK4.flowcell_unmapped_bams' (reason 1 of 1): Evaluating read_lines(flowcell_unmapped_bams_list) failed: java.lang.IllegalArgumentException: Either gs://gatk-test-data/wgs_ubam/NA12878_24RG/NA12878_24RG_small.txt exists on a filesystem not supported by this instance of Cromwell, or a failure occurred while building an actionable path from it. Supported filesystems are: s3, LinuxFileSystem. Failures: s3: S3 URIs must have 's3' scheme: gs://gatk-test-data/wgs_ubam/NA12878_24RG/NA12878_24RG_small.txt (IllegalArgumentException)
LinuxFileSystem: Cannot build a local path from gs://gatk-test-data/wgs_ubam/NA12878_24RG/NA12878_24RG_small.txt (RuntimeException) Please refer to the documentation for more information on how to configure filesystems: cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
        at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.processRunnable(ExpressionKey.scala:29)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$startRunnableNodes$7(WorkflowExecutionActor.scala:511)
        at cats.instances.ListInstances$$anon$1.$anonfun$traverse$2(list.scala:73)
        at cats.instances.ListInstances$$anon$1.loop$2(list.scala:63)
        at cats.instances.ListInstances$$anon$1.$anonfun$foldRight$1(list.scala:63)
        at cats.Eval$.loop$1(Eval.scala:338)
        at cats.Eval$.cats$Eval$$evaluate(Eval.scala:372)
        at cats.Eval$Defer.value(Eval.scala:258)
        at cats.instances.ListInstances$$anon$1.traverse(list.scala:72)
        at cats.instances.ListInstances$$anon$1.traverse(list.scala:12)
        at cats.Traverse$Ops.traverse(Traverse.scala:19)
        at cats.Traverse$Ops.traverse$(Traverse.scala:19)
        at cats.Traverse$ToTraverseOps$$anon$3.traverse(Traverse.scala:19)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.cromwell$engine$workflow$lifecycle$execution$WorkflowExecutionActor$$startRunnableNodes(WorkflowExecutionActor.scala:505)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:187)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:185)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
        at akka.actor.FSM.processEvent(FSM.scala:687)
        at akka.actor.FSM.processEvent$(FSM.scala:681)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$LoggingFSM$$super$processEvent(WorkflowExecutionActor.scala:49)
        at akka.actor.LoggingFSM.processEvent(FSM.scala:820)
        at akka.actor.LoggingFSM.processEvent$(FSM.scala:802)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.processEvent(WorkflowExecutionActor.scala:49)
        at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:678)
        at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:672)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$Timers$$super$aroundReceive(WorkflowExecutionActor.scala:49)
        at akka.actor.Timers.aroundReceive(Timers.scala:51)
        at akka.actor.Timers.aroundReceive$(Timers.scala:40)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.aroundReceive(WorkflowExecutionActor.scala:49)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
        at akka.actor.ActorCell.invoke(ActorCell.scala:557)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

How can I enable both filesystems?

Sign In or Register to comment.