Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

Typical requirements for StructuralVariationDiscoveryPipelineSpark for local execution?

SkyWarriorSkyWarrior ✭✭✭TurkeyMember ✭✭✭
edited August 1 in Ask the GATK team

Hi.

I have a bunch of human whole genome samples at 30X coverage that I want to perform SV analysis. I am already using some other tools but I wanted to give GATK StructuralVariationDiscoveryPipelineSpark a go for its time. I managed to gather all the necessary resources however my initial execution failed due to java heap space error at stage 7 (Used local[4] and did not play around with any heap space parameters on a 64gb machine ). So I am wondering what would be a typical requirement for this workflow and what do broad testers suggest? I have currently 2 computers vacant one with 128gb memory and another with 64 gb of memory.

Thanks.

Answers

  • bhanuGandhambhanuGandham admin Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @SkyWarrior

    Would you please post the exact command you are using, the version of GATK and then entire error log please. Also how much memory do you have on the machine?

  • SkyWarriorSkyWarrior ✭✭✭ TurkeyMember ✭✭✭

    Hi

    I lost the first nohup.out file by mistake but the error was at stage 7 and shortly it was stating that java heap size exceeded out of memory type error. This was the first execution with default parameters.

    Command line was

    gatk StructuralVariationDiscoveryPipelineSpark -I Wes738_final.bam -R hs37d5.2bit --aligner-index-image hs37d5.fa.img --kmers-to-ignore BAD_KMERS_B37.txt --contig-sam-file Wes738_aligned_contigs.sam -O Wes738_structural_variants.vcf --spark-master local[4] --tmp-dir ./tmp
    

    This command was executed on a machine with 64GB ram

    My second execution command line was

    gatk --java-options "-Xmx=100G" StructuralVariationDiscoveryPipelineSpark -I Wes738_final.bam -R hs37d5.2bit --aligner-index-image hs37d5.fa.img --kmers-to-ignore BAD_KMERS_B37.txt --contig-sam-file Wes738_aligned_contigs.sam -O Wes738_structural_variants.vcf --spark-master local[8] --tmp-dir ./tmp
    

    This command was executed on a machine with 128GB ram.

    And the error message was like below

    org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 10.0 failed 1 times, most recent failure: Lost task 4.0 in stage 10.0 (TID 20525, localhost, executor driver): java.io.IOException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
        at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
        at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.lambda$removeUbiquitousKmers$6350c638$1(FindBreakpointEvidenceSpark.java:652)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:863)
        at org.apache.spark.storage.DiskStore$$anonfun$getBytes$4.apply(DiskStore.scala:125)
        at org.apache.spark.storage.DiskStore$$anonfun$getBytes$4.apply(DiskStore.scala:124)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
        at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:126)
        at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:520)
        at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:210)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
        ... 22 more
    
    Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
        at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:361)
        at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.removeUbiquitousKmers(FindBreakpointEvidenceSpark.java:660)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.addAssemblyQNames(FindBreakpointEvidenceSpark.java:507)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.gatherEvidenceAndWriteContigSamFile(FindBreakpointEvidenceSpark.java:176)
        at org.broadinstitute.hellbender.tools.spark.sv.StructuralVariationDiscoveryPipelineSpark.runTool(StructuralVariationDiscoveryPipelineSpark.java:164)
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:528)
        at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: java.io.IOException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
        at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
        at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.lambda$removeUbiquitousKmers$6350c638$1(FindBreakpointEvidenceSpark.java:652)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:863)
        at org.apache.spark.storage.DiskStore$$anonfun$getBytes$4.apply(DiskStore.scala:125)
        at org.apache.spark.storage.DiskStore$$anonfun$getBytes$4.apply(DiskStore.scala:124)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
        at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:126)
        at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:520)
        at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:210)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
        ... 22 more
    

    Both machines use Debian Linux 10 and OpenJDK 1.8-222. All necessary gatk python environments were also set.

    These are not exome samples by the way. Whole genome samples but only named Wes for lab record.

    Regards.

  • bhanuGandhambhanuGandham admin Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @SkyWarrior

    1) 64 GB machine is probably not enough memory. Take a look at this WDL to see the specs we use: https://github.com/broadinstitute/gatk/blob/master/scripts/sv/run_whole_pipeline.sh#L104. When we rent a Dataproc cluster from google we request a 16cpu machine with 7GB per CPU core.
    2) On the 128G machine, can you please try to remove “-Xmx=100G” and instead specify the --num-executors to some smaller value (like 4 or 6).

  • SkyWarriorSkyWarrior ✭✭✭ TurkeyMember ✭✭✭

    Hi

    I tried with a lower executor setting local[4] and I am still getting GC overhead limit exceeded error at a different stage this time.

    19/08/06 23:32:03 INFO BlockManagerInfo: Removed taskresult_19046 on 192.168.1.139:36203 in memory (size: 7.5 MB, free: 13.9 GB)
    19/08/06 23:38:08 ERROR Executor: Exception in task 808.0 in stage 9.0 (TID 19049)
    java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:484)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:150)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:246)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    19/08/06 23:38:09 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 19049,5,main]
    java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:484)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:150)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:246)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    19/08/06 23:38:09 INFO SparkContext: Invoking stop() from shutdown hook
    19/08/06 23:38:09 INFO TaskSetManager: Starting task 810.0 in stage 9.0 (TID 19051, localhost, executor driver, partition 810, PROCESS_LOCAL, 4621 bytes)
    19/08/06 23:38:09 INFO Executor: Running task 810.0 in stage 9.0 (TID 19051)
    19/08/06 23:38:42 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
    java.util.concurrent.TimeoutException
        at java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
    19/08/06 23:38:42 INFO SparkUI: Stopped Spark web UI at http://192.168.1.139:4040
    19/08/06 23:38:42 WARN TaskSetManager: Lost task 808.0 in stage 9.0 (TID 19049, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:484)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:150)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:246)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    
    19/08/06 23:38:42 ERROR TaskSetManager: Task 808 in stage 9.0 failed 1 times; aborting job
    19/08/06 23:38:42 INFO TaskSchedulerImpl: Cancelling stage 9
    19/08/06 23:38:42 INFO MemoryStore: Block taskresult_19047 stored as bytes in memory (estimated size 7.5 MB, free 6.8 GB)
    19/08/06 23:38:42 INFO BlockManagerInfo: Added taskresult_19047 in memory on 192.168.1.139:36203 (size: 7.5 MB, free: 13.9 GB)
    19/08/06 23:38:42 INFO Executor: Finished task 806.0 in stage 9.0 (TID 19047). 7884891 bytes result sent via BlockManager)
    19/08/06 23:38:42 INFO TaskSchedulerImpl: Stage 9 was cancelled
    19/08/06 23:38:42 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(driver, 192.168.1.139, 36203, None),taskresult_19047,StorageLevel(memory, 1 replicas),7884891,0))
    19/08/06 23:38:42 INFO DAGScheduler: ResultStage 9 (collect at FindBreakpointEvidenceSpark.java:752) failed in 6738.133 s due to Job aborted due to stage failure: Task 808 in stage 9.0 failed 1 times, most recent failure: Lost task 808.0 in stage 9.0 (TID 19049, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:484)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:150)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:246)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    
    Driver stacktrace:
    19/08/06 23:38:42 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted([email protected])
    19/08/06 23:39:18 INFO Executor: Executor is trying to kill task 809.0 in stage 9.0 (TID 19050), reason: stage cancelled
    19/08/06 23:39:18 WARN ShutdownHookManager: ShutdownHook 'ClientFinalizer' timeout, java.util.concurrent.TimeoutException
    java.util.concurrent.TimeoutException
        at java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
    19/08/06 23:39:18 INFO Executor: Executor is trying to kill task 810.0 in stage 9.0 (TID 19051), reason: stage cancelled
    19/08/06 23:39:18 INFO Executor: Executor is trying to kill task 807.0 in stage 9.0 (TID 19048), reason: stage cancelled
    19/08/06 23:39:18 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(driver,WrappedArray((19048,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(603323),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(2277),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(25349998),None,false,true,None), AccumulableInfo(255,None,Some(1),None,false,true,None), AccumulableInfo(256,None,Some(1117532),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None))), (19050,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(398245),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(2277),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(25379265),None,false,true,None), AccumulableInfo(255,None,Some(1),None,false,true,None), AccumulableInfo(256,None,Some(737100),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None))), (19051,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(33874),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(0),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(0),None,false,true,None), AccumulableInfo(255,None,Some(0),None,false,true,None), AccumulableInfo(256,None,Some(0),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None)))))
    19/08/06 23:39:18 WARN Executor: Issue communicating with driver in heartbeater
    org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
        at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
        at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
        at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
        at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
        at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:726)
        at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:755)
        at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:755)
        at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:755)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
        at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:755)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        ... 14 more
    19/08/06 23:39:18 INFO DAGScheduler: Job 7 failed: collect at FindBreakpointEvidenceSpark.java:752, took 10011.934477 s
    19/08/06 23:39:18 INFO Executor: Executor killed task 807.0 in stage 9.0 (TID 19048), reason: stage cancelled
    19/08/06 23:39:18 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(7,1565123922917,JobFailed(org.apache.spark.SparkException: Job aborted due to stage failure: Task 808 in stage 9.0 failed 1 times, most recent failure: Lost task 808.0 in stage 9.0 (TID 19049, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:484)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:150)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:246)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    
    Driver stacktrace:))
    19/08/06 23:39:18 WARN NettyRpcEnv: Ignored message: HeartbeatResponse(false)
    19/08/06 23:39:18 INFO SparkContext: SparkContext already stopped.
    19/08/06 23:39:18 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(driver,WrappedArray((19048,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(638550),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(2277),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(25349998),None,false,true,None), AccumulableInfo(255,None,Some(1),None,false,true,None), AccumulableInfo(256,None,Some(1123631),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None))), (19050,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(433472),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(2277),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(25379265),None,false,true,None), AccumulableInfo(255,None,Some(1),None,false,true,None), AccumulableInfo(256,None,Some(737100),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None))), (19051,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(69101),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(0),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(0),None,false,true,None), AccumulableInfo(255,None,Some(0),None,false,true,None), AccumulableInfo(256,None,Some(0),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None)))))
    19/08/06 23:39:18 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(driver,WrappedArray((19048,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(638550),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(2277),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(25349998),None,false,true,None), AccumulableInfo(255,None,Some(1),None,false,true,None), AccumulableInfo(256,None,Some(1123631),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None))), (19050,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(433472),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(2277),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(25379265),None,false,true,None), AccumulableInfo(255,None,Some(1),None,false,true,None), AccumulableInfo(256,None,Some(737100),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None))), (19051,9,0,Vector(AccumulableInfo(240,None,Some(0),None,false,true,None), AccumulableInfo(241,None,Some(0),None,false,true,None), AccumulableInfo(242,None,Some(0),None,false,true,None), AccumulableInfo(243,None,Some(0),None,false,true,None), AccumulableInfo(244,None,Some(0),None,false,true,None), AccumulableInfo(245,None,Some(69101),None,false,true,None), AccumulableInfo(246,None,Some(0),None,false,true,None), AccumulableInfo(247,None,Some(0),None,false,true,None), AccumulableInfo(248,None,Some(0),None,false,true,None), AccumulableInfo(249,None,Some(0),None,false,true,None), AccumulableInfo(250,None,Some([]),None,false,true,None), AccumulableInfo(251,None,Some(0),None,false,true,None), AccumulableInfo(252,None,Some(0),None,false,true,None), AccumulableInfo(253,None,Some(0),None,false,true,None), AccumulableInfo(254,None,Some(0),None,false,true,None), AccumulableInfo(255,None,Some(0),None,false,true,None), AccumulableInfo(256,None,Some(0),None,false,true,None), AccumulableInfo(257,None,Some(0),None,false,true,None), AccumulableInfo(258,None,Some(0),None,false,true,None), AccumulableInfo(259,None,Some(0),None,false,true,None), AccumulableInfo(260,None,Some(0),None,false,true,None), AccumulableInfo(261,None,Some(0),None,false,true,None), AccumulableInfo(262,None,Some(0),None,false,true,None), AccumulableInfo(263,None,Some(0),None,false,true,None)))))
    [August 6, 2019 11:39:18 PM EET] org.broadinstitute.hellbender.tools.spark.sv.StructuralVariationDiscoveryPipelineSpark done. Elapsed time: 209.06 minutes.
    Runtime.totalMemory()=30052188160
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 808 in stage 9.0 failed 1 times, most recent failure: Lost task 808.0 in stage 9.0 (TID 19049, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:484)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:150)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:246)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    
    Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
        at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:361)
        at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.getKmerIntervals(FindBreakpointEvidenceSpark.java:752)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.getKmerAndIntervalsSet(FindBreakpointEvidenceSpark.java:546)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.addAssemblyQNames(FindBreakpointEvidenceSpark.java:503)
        at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.gatherEvidenceAndWriteContigSamFile(FindBreakpointEvidenceSpark.java:176)
        at org.broadinstitute.hellbender.tools.spark.sv.StructuralVariationDiscoveryPipelineSpark.runTool(StructuralVariationDiscoveryPipelineSpark.java:164)
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:528)
        at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at com.esotericsoftware.kryo.io.Input.readString(Input.java:484)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:150)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:246)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:154)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    19/08/06 23:39:18 ERROR TaskSchedulerImpl: Exception in statusUpdate
    java.util.concurrent.RejectedExecutionException: Task [email protected] rejected from [email protected][Shutting down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 19048]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
        at org.apache.spark.scheduler.TaskResultGetter.enqueueFailedTask(TaskResultGetter.scala:131)
        at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:424)
        at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:403)
        at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:67)
        at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    19/08/06 23:39:43 ERROR ShutdownHookManager: ShutdownHookManger shutdown forcefully.
    Using GATK jar /home/exome/scripts/gatk-package-4.1.2.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/exome/scripts/gatk-package-4.1.2.0-local.jar StructuralVariationDiscoveryPipelineSpark -I Wes738_final.bam -R hs37d5.2bit --aligner-index-image hs37d5.fa.img --kmers-to-ignore BAD_KMERS_B37.txt --contig-sam-file Wes738_aligned_contigs.sam -O Wes738_structural_variants.vcf --spark-master local[4] --tmp-dir ./tmp
    

    I am trying to get this up and running on a local machine as I don't have the means to work on GC or any other HPC cluster.

    Thanks for the help.

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    Hi @SkyWarrior .
    First of all, we designed the pipeline to be run on a Spark cluster so it is imaginable that it will not work on a local machine.
    Having said that, it will be great if the pipeline can work on a local machine with appropriate resources available.
    It is nice to see that the pipeline has passed stage 7 and is now trying to go through stage 9.

    Now, I'd suggest try running on the machine with 128 GB machine, because certain stages of the pipeline is memory hungry.
    The following parameters will affect the memory burden on the machine

    --num-executors <INTEGER>
    --executor-cores <INTEGER>
    --executor-memory <INTEGER>G
    --driver-memory <INTEGER>G
    --conf spark.yarn.executor.memoryOverhead=<INTEGER> # 10% of executor memory recommended
    

    Considering that you have a "low" (compared to what we typically use) memory machine, I'd say start with these values limited to low numbers (2 executors, 1 core per executor, 30GB memory for driver, 40GB memory for executors). Of course the downside is longer run time.

  • SkyWarriorSkyWarrior ✭✭✭ TurkeyMember ✭✭✭
    edited August 7

    Thanks @shuangBroad

    I will give it a try. I totally agree that it should run on a spark cluster but I was wondering if it can do the job on a standalone machine as well. My previous error was not GC overhead related. It was an error at stage 10 and it was about exceeding Integer.MAX_VALUE. Do you have any comments on that?

    Caused by: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
    

    Thanks again.

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    @SkyWarrior I've never seen that while running the pipeline. It does look like it is from the Spark dependency. It is possible that Spark cannot find enough memory and asks for disk for storage but the disk is low as well.

  • SkyWarriorSkyWarrior ✭✭✭ TurkeyMember ✭✭✭

    Interesting. This run was done on a 8TB spinner alone. I will try to allocate more.

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    Then it shouldn't be disk.

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    One thing though, your bam is named Wes738_final.bam. Is it a WES or WGS bam? The pipeline is for WGS bam, so it will not perform well for WES bams.
    One more thing, I'm not sure which version of GATK you are using, I noticed you are using 2bit reference. This might cause a problem later because I was told the engine stopped supporting 2bit reference a while back (the SV team had been working on another higher priority project so wasn't paying attention to this, but we are now). If the pipeline manages to successfully finish (the memory limitations), then I'd say use .fasta.gz in the future.

  • SkyWarriorSkyWarrior ✭✭✭ TurkeyMember ✭✭✭
    edited August 7

    This is a WGS 30x 2x150 Novaseq (lab manager was lazy to assign a new code name for WGS so we had to go with Wes. ). Sure I will try fasta.gz as well. Now the system is importing GVCFs so I will wait until it is done. BTW this is GATK 4.1.2.0. I like testing bleeding edge stuff.

  • SkyWarriorSkyWarrior ✭✭✭ TurkeyMember ✭✭✭
    edited August 8

    The following parameters will affect the memory burden on the machine

    --num-executors <INTEGER>
    --executor-cores <INTEGER>
    --executor-memory <INTEGER>G
    --driver-memory <INTEGER>G
    --conf spark.yarn.executor.memoryOverhead=<INTEGER> # 10% of executor memory recommended
    

    Considering that you have a "low" (compared to what we typically use) memory machine, I'd say start with these values limited to low numbers (2 executors, 1 core per executor, 30GB memory for driver, 40GB memory for executors). Of course the downside is longer run time.

    Hi
    The parameters mentioned here are not recognized by gatk StructuralVariationDiscoveryPipelineSpark
    and makes me feel like somethings are not implemented yet.

    ***********************************************************************
    
    A USER ERROR has occurred: executor-cores is not a recognized option
    
    ***********************************************************************
    

    And more...

    Can you comment on that?

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    oh, that is some spark configuration that needs to be specified in the block of arguments after --.

    Sorry for the confusion.
    Please see here.

  • SkyWarriorSkyWarrior ✭✭✭ TurkeyMember ✭✭✭

    Interesting. Let me check that as well.

  • SkyWarriorSkyWarrior ✭✭✭ TurkeyMember ✭✭✭

    Hi

    I tried the suggestion to add -- before all spark arguments as a seperator like the example given in the link however the tool still does not recognize anything that comes after that.

    -- --spark-master local --spark-runner LOCAL --num-executors 2 ....
    

    and the error message says num-executors is not a recognized option so this is a no go I suppose?

    By the way I tried with a different bam file yesterday and this time -Xmx100G and local[4] went way beyond the previous bam file however this time the execution became too unresponsive such that It got stuck at 16/3000 something at job 8 stage 10 and there were multiple error messages thrown in the middle of the job stating timeouts and re-registering at heartbeat. Felt like something about GC is messing with the execution. I will try with more threads for GC.

    I am wondering if size exceeds Integer.MAX_VALUE is something related to the previous bam file? I checked the bam file using ValidateSamFile and there were no errors reported by gatk. Could it be that the code hits an edge case that was not handled in Wes738 sample but not Wes736 sample? I wish I could share the whole bam and the problem could be repeated but it is way large? Maybe I can strip qual scores and compress it as cram? Lossless cram is about 45gb but maybe a lossy one would work?

    What do you think?

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    Admittedly, I've never run the whole pipeline in local mode (I don't have access to a powerful machine locally), though I have run certain non-memory-hungry stages locally without problems.
    Now, for the first problem you encountered, I suspect it is because that you also specified spark-master. We have been specifying the num-executors without any problems as you can see here (this is the script we use for running test jobs on Google Dataproc).

    It's nice to know that a different bam lets you advance further (though that max value problem is still puzzling). The heartbeat messages you saw can be (hopefully/possibly) tuned away by setting spark.executor.heartbeatInterval=120 (also see the link I shared above). But I suspect the message was caused by the executor killed by YARN for memory reasons, so memory tuning is likely necessary to fully resolve this.

    It is possible that the file is triggering an edge case (not that we have never seen that ;) ). But I'd recommend trying to get Wes736 going before transferring such a large file.

Sign In or Register to comment.