Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

AWS Batch randomly fails when running multiple workflows

caaespincaaespin San Francisco, CAMember
Hi I'm running multiple workflows on cromwell v42 via AWS Batch and I randomly get errors related to S3. Similar to this one: /cromwell/issues/4687 on github.

I deployed using the full stack deployment prescribed on the opendata.aws website (sorry still too new, can't post links yet).

My stacktrace looks like this:

```
2019-07-25 06:20:07,138 cromwell-system-akka.dispatchers.engine-dispatcher-31 ERROR - WorkflowManagerActor Workflow 55f7f95a-3956-44a9-8080-4b53ce3a424c failed (during ExecutingWorkflowState): cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - IOException: Could not read from s3://my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt: s3://s3.amazonaws.com/my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt
Caused by: java.io.IOException: Could not read from s3://my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt: s3://s3.amazonaws.com/my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt
at cromwell.core.path.EvenBetterPathMethods$$anonfun$withReader$2.applyOrElse(EvenBetterPathMethods.scala:112)
at cromwell.core.path.EvenBetterPathMethods$$anonfun$withReader$2.applyOrElse(EvenBetterPathMethods.scala:111)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at scala.util.Failure.recoverWith(Try.scala:232)
at cromwell.core.path.EvenBetterPathMethods.withReader(EvenBetterPathMethods.scala:111)
at cromwell.core.path.EvenBetterPathMethods.withReader$(EvenBetterPathMethods.scala:108)
at cromwell.filesystems.s3.S3Path.withReader(S3PathBuilder.scala:160)
at cromwell.core.path.EvenBetterPathMethods.limitFileContent(EvenBetterPathMethods.scala:120)
at cromwell.core.path.EvenBetterPathMethods.limitFileContent$(EvenBetterPathMethods.scala:120)
at cromwell.filesystems.s3.S3Path.limitFileContent(S3PathBuilder.scala:160)
at cromwell.engine.io.nio.NioFlow.$anonfun$readAsString$1(NioFlow.scala:100)
at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:87)
at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:351)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:372)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:312)
at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.nio.file.NoSuchFileException: s3://s3.amazonaws.com/my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt
at org.lerch.s3fs.S3FileSystemProvider.newInputStream(S3FileSystemProvider.java:351)
at java.nio.file.Files.newInputStream(Files.java:152)
at better.files.File.newInputStream(File.scala:337)
at cromwell.core.path.BetterFileMethods.newInputStream(BetterFileMethods.scala:240)
at cromwell.core.path.BetterFileMethods.newInputStream$(BetterFileMethods.scala:239)
at cromwell.filesystems.s3.S3Path.newInputStream(S3PathBuilder.scala:160)
at cromwell.core.path.EvenBetterPathMethods.mediaInputStream(EvenBetterPathMethods.scala:96)
at cromwell.core.path.EvenBetterPathMethods.mediaInputStream$(EvenBetterPathMethods.scala:93)
at cromwell.filesystems.s3.S3Path.mediaInputStream(S3PathBuilder.scala:160)
at cromwell.core.path.EvenBetterPathMethods.$anonfun$withReader$1(EvenBetterPathMethods.scala:111)
at cromwell.util.TryWithResource$.$anonfun$tryWithResource$1(TryWithResource.scala:14)
at scala.util.Try$.apply(Try.scala:209)
at cromwell.util.TryWithResource$.tryWithResource(TryWithResource.scala:10)
... 18 more

2019-07-25 06:20:07,138 cromwell-system-akka.dispatchers.engine-dispatcher-31 INFO - WorkflowManagerActor WorkflowActor-55f7f95a-3956-44a9-8080-4b53ce3a424c is in a terminal state: WorkflowFailedState
```

I did not see any Time Out errors in my cromwell logs, so I don't think this is related to issue cromwell/issues/4303 on github.

Anyone has had a similar error? How did you guys solved it? Thanks!

-Carlos

Answers

Sign In or Register to comment.