Error when running Haplotyper Caller using Scatter Gather

kritikoolkritikool Member
edited October 2016 in Ask the WDL team

Hello,

I was trying out Example #4 (How to use Scatter Gather to joint call genotypes) from here: https://software.broadinstitute.org/wdl/userguide/topic?name=wdl-tutorials

I first tested it on one example file which worked fine, and am now trying on a batch of 15 samples.

I executed fine for about 2.5 hours, and then stopped. This is the error.

I looked up the stderr.log file, and it only has logs of HaplotypeCaller running on Chromosome 1. There is no error.

Not sure what is going on - can anyone help ?

[2016-10-20 16:13:20,62] [error] WorkflowManagerActor Workflow 093aafca-904f-4598-b1ba-03211392c831 failed (during ExecutingWorkflowState): java.lang.Exception: Call jointCallingGenotypes.HaplotypeCallerERC: return code was 137
[2016-10-20 16:13:20,62] [info] WorkflowManagerActor WorkflowActor-093aafca-904f-4598-b1ba-03211392c831 is in a                              terminal state: WorkflowFailedState
[2016-10-20 16:13:43,88] [info] Message [cromwell.backend.async.AsyncBackendJobExecutionActor$IssuePollRequest]                              from Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/WorkflowActor-093aafca-9                             04f-4598-b1ba-03211392c831/WorkflowExecutionActor-093aafca-904f-4598-b1ba-03211392c831/093aafca-904f-4598-b1ba-                             03211392c831-EngineJobExecutionActor-jointCallingGenotypes.HaplotypeCallerERC:11:1/093aafca-904f-4598-b1ba-0321                             1392c831-BackendJobExecutionActor-093aafca:jointCallingGenotypes.HaplotypeCallerERC:11:1/SharedFileSystemAsyncJ                             obExecutionActor#-386233882] to Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActo                             r/WorkflowActor-093aafca-904f-4598-b1ba-03211392c831/WorkflowExecutionActor-093aafca-904f-4598-b1ba-03211392c83                             1/093aafca-904f-4598-b1ba-03211392c831-EngineJobExecutionActor-jointCallingGenotypes.HaplotypeCallerERC:11:1/09                             3aafca-904f-4598-b1ba-03211392c831-BackendJobExecutionActor-093aafca:jointCallingGenotypes.HaplotypeCallerERC:1                             1:1/SharedFileSystemAsyncJobExecutionActor#-386233882] was not delivered. [1] dead letters encountered. This lo                             gging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-leters-during-shutdown'.
[2016-10-20 16:13:58,34] [info] Message [cromwell.backend.async.AsyncBackendJobExecutionActor$IssuePollRequest]                              from Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/WorkflowActor-093aafca-9                             04f-4598-b1ba-03211392c831/WorkflowExecutionActor-093aafca-904f-4598-b1ba-03211392c831/093aafca-904f-4598-b1ba-                             03211392c831-EngineJobExecutionActor-jointCallingGenotypes.HaplotypeCallerERC:0:1/093aafca-904f-4598-b1ba-03211                             392c831-BackendJobExecutionActor-093aafca:jointCallingGenotypes.HaplotypeCallerERC:0:1/SharedFileSystemAsyncJob                             ExecutionActor#1739532357] to Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/                             WorkflowActor-093aafca-904f-4598-b1ba-03211392c831/WorkflowExecutionActor-093aafca-904f-4598-b1ba-03211392c831/                             093aafca-904f-4598-b1ba-03211392c831-EngineJobExecutionActor-jointCallingGenotypes.HaplotypeCallerERC:0:1/093aa                             fca-904f-4598-b1ba-03211392c831-BackendJobExecutionActor-093aafca:jointCallingGenotypes.HaplotypeCallerERC:0:1/                             SharedFileSystemAsyncJobExecutionActor#1739532357] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[2016-10-20 16:14:05,07] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
Workflow 093aafca-904f-4598-b1ba-03211392c831 transitioned to state Failed

Best Answer

Answers

  • Okay - I did some more investigation - looks like there is some kind of java memory allocation issue. Could anyone tell me how to tackle this issue when running scatter gather ?

    WARN 21:50:51,945 HaplotypeCallerGenotypingEngine - location chr1:142626698: too many alternative alleles found (7) larger than the maximum requested with -maxAltAlleles (6), the following will be dropped: TTTA.
    INFO 21:51:33,637 ProgressMeter - chr1:144525632 16571.0 2.5 h 6.3 d 4.6% 54.1 h 51.6 h
    OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007a4800000, 1535115264, 0) failed; error='Cannot allocate memory' (errno=12)

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    Are you limiting the amount of memory the task can use in any way? For example, if your command section starts with java -Xmx8g -jar HaplotypeCaller, or if your runtime section has a value for memory, you can increase those values to give a larger memory allocation.

    If neither of these options work, please let me know.

  • Thank you i will try this out. I have been trying out the tutorial word by word and did not add any other commands

  • kritikoolkritikool Member
    edited October 2016

    I tried scatter gather with java -Xmx8g -jar HaplotypeCaller (I'm running this on a r3.4xlarge Amazon instance), and am still getting this error.
    My runtime section does not have a value for memory, my code is exactly same as Tutorial#4.. Is there anything else I can do to prevent this error ?

    Here is the error message:

    [2016-10-24 14:52:51,01] [error] WorkflowManagerActor Workflow 775008ea-9348-4f56-9644-604f19d53f48 failed (during Ex ecutingWorkflowState): java.lang.Exception: Call jointCallingGenotypes.HaplotypeCallerERC: return code was 137
    [2016-10-24 14:52:51,02] [info] WorkflowManagerActor WorkflowActor-775008ea-9348-4f56-9644-604f19d53f48 is in a termi nal state: WorkflowFailedState
    [2016-10-24 14:53:23,32] [info] Message [cromwell.backend.async.AsyncBackendJobExecutionActor$IssuePollRequest] from Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/WorkflowActor-775008ea-9348-4f56-964 4-604f19d53f48/WorkflowExecutionActor-775008ea-9348-4f56-9644-604f19d53f48/775008ea-9348-4f56-9644-604f19d53f48-Engin eJobExecutionActor-jointCallingGenotypes.HaplotypeCallerERC:10:1/775008ea-9348-4f56-9644-604f19d53f48-BackendJobExecu tionActor-775008ea:jointCallingGenotypes.HaplotypeCallerERC:10:1/SharedFileSystemAsyncJobExecutionActor#-564190174] t o Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/WorkflowActor-775008ea-9348-4f56-9 644-604f19d53f48/WorkflowExecutionActor-775008ea-9348-4f56-9644-604f19d53f48/775008ea-9348-4f56-9644-604f19d53f48-Eng ineJobExecutionActor-jointCallingGenotypes.HaplotypeCallerERC:10:1/775008ea-9348-4f56-9644-604f19d53f48-BackendJobExe cutionActor-775008ea:jointCallingGenotypes.HaplotypeCallerERC:10:1/SharedFileSystemAsyncJobExecutionActor#-564190174] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration setti ngs 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
    [2016-10-24 14:53:27,90] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
    Workflow 775008ea-9348-4f56-9644-604f19d53f48 transitioned to state Failed

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    In a WDL task, you can specify a runtime section. One of the parameters you can set there is called memory. However, since you are following the tutorial, and you don't have a memory value specified, there shouldn't be a limit. I've touched base with the developers and should have an answer for you soon with regards to this error.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    After speaking with the developers, they agree that it is likely you have run out of memory when trying to run this command. If each element in your scatter is a large job, running all elements at once can quickly use up all of your memory.

    What you can do is limit the number of jobs that run concurrently. If you are scattering by sample and have 15 samples, then you can tell Cromwell to run only 5 samples at a time. Here, 5 is an example--the number can be higher or lower depending on how much memory you have access to.To limit the number of concurrent jobs, you can see my explanation here. Specifically, you should follow the Job Limits option detailed therein.

    I hope this solves your error. If it does not, please let me know.

  • Thank you, I will try that out

  • I have a basic follow up question reg Cromwell - do I have to rebuild the cromwell jar file ?

    I downloaded the cromwell source code. Then added this line "system.max-concurrent-workflows = 5" at the end of my application.conf file.

    Then tried running
    java -Dconfig.file=/home/software/cromwell/application.conf /home/software/cromwell/cromwell-0.21.jar run jointCallingGenotypesAll.wdl jointCallingGenotypesAll_inputs.json

    But it looks like I'm doing some noob mistake. Could anyone shed some light on this ? This is new turf for me. Thanks

  • same mistake to me when using -Dconfig.file to specify the config file path.

    @kritikool said:
    I have a basic follow up question reg Cromwell - do I have to rebuild the cromwell jar file ?

    I downloaded the cromwell source code. Then added this line "system.max-concurrent-workflows = 5" at the end of my application.conf file.

    Then tried running
    java -Dconfig.file=/home/software/cromwell/application.conf /home/software/cromwell/cromwell-0.21.jar run jointCallingGenotypesAll.wdl jointCallingGenotypesAll_inputs.json

    But it looks like I'm doing some noob mistake. Could anyone shed some light on this ? This is new turf for me. Thanks

  • Thanks, this is great - I will try it out. Keeping fingers crossed !

Sign In or Register to comment.