Update: July 26, 2019
This section of the forum is no longer actively monitored. We are working on a support migration plan that we will share here shortly. Apologies for this inconvenience.

AWS Batch randomly fails when running multiple workflows

caaespincaaespin San Francisco, CAMember
Hi I'm running multiple workflows on cromwell v42 via AWS Batch and I randomly get errors related to S3. Similar to this one: /cromwell/issues/4687 on github.

I deployed using the full stack deployment prescribed on the opendata.aws website (sorry still too new, can't post links yet).

My stacktrace looks like this:

```
2019-07-25 06:20:07,138 cromwell-system-akka.dispatchers.engine-dispatcher-31 ERROR - WorkflowManagerActor Workflow 55f7f95a-3956-44a9-8080-4b53ce3a424c failed (during ExecutingWorkflowState): cromwell.engine.io.IoAttempts$EnhancedCromwellIoException: [Attempted 1 time(s)] - IOException: Could not read from s3://my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt: s3://s3.amazonaws.com/my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt
Caused by: java.io.IOException: Could not read from s3://my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt: s3://s3.amazonaws.com/my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt
at cromwell.core.path.EvenBetterPathMethods$$anonfun$withReader$2.applyOrElse(EvenBetterPathMethods.scala:112)
at cromwell.core.path.EvenBetterPathMethods$$anonfun$withReader$2.applyOrElse(EvenBetterPathMethods.scala:111)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at scala.util.Failure.recoverWith(Try.scala:232)
at cromwell.core.path.EvenBetterPathMethods.withReader(EvenBetterPathMethods.scala:111)
at cromwell.core.path.EvenBetterPathMethods.withReader$(EvenBetterPathMethods.scala:108)
at cromwell.filesystems.s3.S3Path.withReader(S3PathBuilder.scala:160)
at cromwell.core.path.EvenBetterPathMethods.limitFileContent(EvenBetterPathMethods.scala:120)
at cromwell.core.path.EvenBetterPathMethods.limitFileContent$(EvenBetterPathMethods.scala:120)
at cromwell.filesystems.s3.S3Path.limitFileContent(S3PathBuilder.scala:160)
at cromwell.engine.io.nio.NioFlow.$anonfun$readAsString$1(NioFlow.scala:100)
at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:87)
at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:351)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:372)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:312)
at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.nio.file.NoSuchFileException: s3://s3.amazonaws.com/my-bucket/cromwell-execution/my-workflow/55f7f95a-3956-44a9-8080-4b53ce3a424c/some-tool/shard-0/some-tool-0-rc.txt
at org.lerch.s3fs.S3FileSystemProvider.newInputStream(S3FileSystemProvider.java:351)
at java.nio.file.Files.newInputStream(Files.java:152)
at better.files.File.newInputStream(File.scala:337)
at cromwell.core.path.BetterFileMethods.newInputStream(BetterFileMethods.scala:240)
at cromwell.core.path.BetterFileMethods.newInputStream$(BetterFileMethods.scala:239)
at cromwell.filesystems.s3.S3Path.newInputStream(S3PathBuilder.scala:160)
at cromwell.core.path.EvenBetterPathMethods.mediaInputStream(EvenBetterPathMethods.scala:96)
at cromwell.core.path.EvenBetterPathMethods.mediaInputStream$(EvenBetterPathMethods.scala:93)
at cromwell.filesystems.s3.S3Path.mediaInputStream(S3PathBuilder.scala:160)
at cromwell.core.path.EvenBetterPathMethods.$anonfun$withReader$1(EvenBetterPathMethods.scala:111)
at cromwell.util.TryWithResource$.$anonfun$tryWithResource$1(TryWithResource.scala:14)
at scala.util.Try$.apply(Try.scala:209)
at cromwell.util.TryWithResource$.tryWithResource(TryWithResource.scala:10)
... 18 more

2019-07-25 06:20:07,138 cromwell-system-akka.dispatchers.engine-dispatcher-31 INFO - WorkflowManagerActor WorkflowActor-55f7f95a-3956-44a9-8080-4b53ce3a424c is in a terminal state: WorkflowFailedState
```

I did not see any Time Out errors in my cromwell logs, so I don't think this is related to issue cromwell/issues/4303 on github.

Anyone has had a similar error? How did you guys solved it? Thanks!

-Carlos

Answers

Sign In or Register to comment.