We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
Executor heartbeat timed out after X ms | StructuralVariationDiscoveryPipelineSpark

Hello GATK team,
I'm trying to run 'StructuralVariationDiscoveryPipelineSpark' to find CNVs, it starts well but after while it gives this error 'ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 137575 ms'
(part of the running commands when error start to appear):
"19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece610 !
19/09/13 06:49:18 INFO MemoryStore: MemoryStore cleared
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece611 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece614 !
19/09/13 06:49:18 INFO BlockManager: BlockManager stopped
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece615 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece612 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece613 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece607 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece608 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece605 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece606 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece609 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece600 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece603 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece604 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece601 !
19/09/13 06:49:18 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_17_piece602 !
19/09/13 06:49:18 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerRemoved(1568346558188,BlockManagerId(driver, 10.109.201.103, 40444, None))
19/09/13 06:49:18 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(driver, 10.109.201.103, 40444, None)
19/09/13 06:49:18 INFO BlockManagerMasterEndpoint: Registering block manager 10.109.201.103:40444 with 15.8 GB RAM, BlockManagerId(driver, 10.109.201.103, 40444, None)
19/09/13 06:49:18 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1568346558190,BlockManagerId(driver, 10.109.201.103, 40444, None),16990076928,Some(16990076928),Some(0))
19/09/13 06:49:18 INFO BlockManagerMaster: BlockManagerMaster stopped
19/09/13 06:49:18 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/09/13 06:49:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.109.201.103, 40444, None)
19/09/13 06:49:18 INFO BlockManager: Reporting 0 blocks to the master.
19/09/13 06:49:18 INFO SparkContext: Successfully stopped SparkContext
06:49:18.208 INFO StructuralVariationDiscoveryPipelineSpark - Shutting down engine
[September 13, 2019 6:49:18 AM AST] org.broadinstitute.hellbender.tools.spark.sv.StructuralVariationDiscoveryPipelineSpark done. Elapsed time: 88.37 minutes.
Runtime.totalMemory()=31999918080
org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 8.0 failed 1 times, most recent failure: Lost task 8.0 in stage 8.0 (TID 37720, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 137575 ms
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:361)
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.removeUbiquitousKmers(FindBreakpointEvidenceSpark.java:660)
at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.addAssemblyQNames(FindBreakpointEvidenceSpark.java:507)
at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.gatherEvidenceAndWriteContigSamFile(FindBreakpointEvidenceSpark.java:176)
at org.broadinstitute.hellbender.tools.spark.sv.StructuralVariationDiscoveryPipelineSpark.runTool(StructuralVariationDiscoveryPipelineSpark.java:164)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:528)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
19/09/13 06:49:18 INFO ShutdownHookManager: Shutdown hook called
19/09/13 06:49:18 INFO ShutdownHookManager: Deleting directory /tmp/spark-57685327-f8c7-4813-88d6-c9ef0f8a721f
Using GATK jar /sw/csi/gatk/4.1.2.0/el7.5_binary/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /sw/csi/gatk/4.1.2.0/el7.5_binary/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar StructuralVariationDiscoveryPipelineSpark -I RMNISTHS_30xdownsample.sorted.bam -R /ibex/scratch/althubsw/ref/human_g1k_v37.2bit --aligner-index-image reference19.fa.img --kmers-to-ignore kmers_to_ignore19.txt --contig-sam-file aligned_contigs.sam -O structural_variants.vcf "
I sent it as a job in slurm job scheduler, I specify 400 GB for it and my BAM file size 148 GB
Any help to avoid that would be appreciated.
Thanks.
Answers
Hi @Sakhaa and @sarawasl , thanks for testing out our Spark SV pipeline!
I do some questions before I can make concrete suggestions.
Thanks!
your question helped me to figure out what I'm using is wrong, now I'm converting to use GermlineCNVCaller