We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

MarkDuplicatesSpark Error

slubansluban IsraelMember
Running MarkDuplicatesSpark on a bam file (after converting bwa sam output to bam using samtools) with the following command:

gatk MarkDuplicatesSpark --TMP_DIR /home/path/tmp -I /home/path/gatk/S1_S1_001_bwa.bam -O /home/path/gatk/S1_S1_001_bwa_MarkedDups.bam 2> /home/path/gatk/S1_S1_001_bwa_MarkedDups.log &

It runs for about 3 minutes, creating tmp files in the tmp folder, then exits with the following error in the log:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 6.0 failed 1 times, most recent failure: Lost task 41.0 in stage 6.0 (TID 4148, localhost, executor driver): java.lang.NullPointerException

Might you please suggest a solution?

Best Answer

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @sluban

    Please post the version of GATK you are using and the entire error log.

  • slubansluban IsraelMember
    The version of GATK we are using is v4.0.8.1

    Here is the error log:

    19/11/04 11:40:22 ERROR Executor: Exception in task 41.0 in stage 6.0 (TID 4148)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 117.0 in stage 6.0 (TID 4224)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 20.0 in stage 6.0 (TID 4127)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 70.0 in stage 6.0 (TID 4177)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 109.0 in stage 6.0 (TID 4216)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 34.0 in stage 6.0 (TID 4141)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 18.0 in stage 6.0 (TID 4125)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 33.0 in stage 6.0 (TID 4140)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 124.0 in stage 6.0 (TID 4231)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 80.0 in stage 6.0 (TID 4187)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 ERROR Executor: Exception in task 104.0 in stage 6.0 (TID 4211)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 INFO Executor: Running task 129.0 in stage 6.0 (TID 4236)
    19/11/04 11:40:22 WARN TaskSetManager: Lost task 41.0 in stage 6.0 (TID 4148, localhost, executor driver): java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    19/11/04 11:40:22 INFO TaskSetManager: Lost task 117.0 in stage 6.0 (TID 4224) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 1]
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 109.0 in stage 6.0 (TID 4216) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 2]
    19/11/04 11:40:22 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 790 blocks
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 20.0 in stage 6.0 (TID 4127) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 3]
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 70.0 in stage 6.0 (TID 4177) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 4]
    19/11/04 11:40:22 INFO DAGScheduler: ShuffleMapStage 6 (flatMapToPair at MarkDuplicatesSparkUtils.java:103) failed in 4.066 s due to Job aborted due to stage failure: Task 41 in stage 6.0 failed 1 times, most recent failure: Lost task 41.0 in stage 6.0 (TID 4148, localhost, executor driver): java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    Driver stacktrace:
    19/11/04 11:40:22 INFO DAGScheduler: Job 3 failed: sortByKey at SparkUtils.java:164, took 4.144140 s
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 34.0 in stage 6.0 (TID 4141) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 5]
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 124.0 in stage 6.0 (TID 4231) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 6]
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 18.0 in stage 6.0 (TID 4125) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 7]
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 80.0 in stage 6.0 (TID 4187) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 8]
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 33.0 in stage 6.0 (TID 4140) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 9]
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 104.0 in stage 6.0 (TID 4211) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 10]
    19/11/04 11:40:22 ERROR Executor: Exception in task 44.0 in stage 6.0 (TID 4151)
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 790 blocks
    19/11/04 11:40:22 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 9 ms
    19/11/04 11:40:22 INFO TaskSetManager: Lost task 44.0 in stage 6.0 (TID 4151) on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 11]
    19/11/04 11:40:22 INFO Executor: Executor killed task 129.0 in stage 6.0 (TID 4236), reason: stage cancelled
    19/11/04 11:40:22 ERROR TaskSchedulerImpl: Exception in statusUpdate
    java.util.concurrent.RejectedExecutionException: Task [email protected] rejected from [email protected][Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 4122]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
    at org.apache.spark.scheduler.TaskResultGetter.enqueueFailedTask(TaskResultGetter.scala:131)
    at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:424)
    at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:403)
    at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:67)
    at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 INFO Executor: Executor interrupted and killed task 15.0 in stage 6.0 (TID 4122), reason: stage cancelled
    19/11/04 11:40:22 ERROR TaskSchedulerImpl: Exception in statusUpdate
    java.util.concurrent.RejectedExecutionException: Task [email protected] rejected from [email protected][Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 4122]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
    at org.apache.spark.scheduler.TaskResultGetter.enqueueFailedTask(TaskResultGetter.scala:131)
    at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:424)
    at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:403)
    at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:67)
    at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    19/11/04 11:40:38 INFO MemoryStore: MemoryStore cleared
    19/11/04 11:40:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 3.07 minutes.
    Runtime.totalMemory()=28239200256
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 6.0 failed 1 times, most recent failure: Lost task 41.0 in stage 6.0 (TID 4148, localhost, executor driver): java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
    at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:266)
    at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:128)
    at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
    at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
    at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:936)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.sortUsingElementsAsKeys(SparkUtils.java:164)
    at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.sortSamRecordsToMatchHeader(ReadsSparkSink.java:382)
    at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReadsSingle(ReadsSparkSink.java:289)
    at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:206)
    at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:307)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDuplicatesSpark.java:200)
    at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:461)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:30)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:182)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:201)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Caused by: java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.EmptyFragment.<init>(EmptyFragment.java:35)
    at org.broadinstitute.hellbender.utils.read.markduplicates.sparkrecords.MarkDuplicatesSparkRecord.newEmptyFragment(MarkDuplicatesSparkRecord.java:37)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$null$0(MarkDuplicatesSparkUtils.java:114)
    at java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:372)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$17d832cf$1(MarkDuplicatesSparkUtils.java:123)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    19/11/04 11:40:38 INFO ShutdownHookManager: Shutdown hook called
    19/11/04 11:40:38 INFO ShutdownHookManager: Deleting directory /home/albert_zeev/tmp/spark-9e981e67-11ac-402a-8b36-36b95cffc072
  • slubansluban IsraelMember
    Hi,

    We upgraded to the latest GATKv4.1.4.0 and re-ran MarkDuplicatesSpark. This time, it gave a more verbose error, that our bam files were missing read groups information. So we used picard AddOrReplaceReadGroups to add read groups information to the header of the bam files and re-ran MarkDuplicatesSpark, which worked this time around.

    Thank you very much,
    sluban
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Great! I am glad to hear it worked.

Sign In or Register to comment.