We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

MarkDuplicatesSpark got 'OutOfMemoryError: Java heap space '

adam_diadam_di PKUMember
edited August 2019 in Ask the GATK team
I'm following GATK Best Practices (v4.1.2), and runing the data-preprocessing step using MarkDuplicatesSpark on an HPC cluster. The java version is 11.0.1 and the available RAM is 48GB. The data is a paired-end targeted panel DNA-seq, and is about 13.32GB. Four output files were generated and they are *_dedup_sorted.bam of 7.61GB, *_dedup_sorted.bam.bai of 0 byte, *_dedup_sorted.bam.sbi of 312KB, and a file folder *_dedup_sort.bam.parts containing a _SUCCESS file of 0 byte.
Here is the exact command and the error I got
[command]
gatk MarkDuplicatesSpark \
-I $bam_path/breast_tissue+ptc_gatk_bundle/${sample_name}.bam \
-O $bam_path/breast_tissue+ptc_gatk_bundle/${sample_name}_dedup_sort.bam \
--tmp-dir /gpfs/share/home/1801111726/TMP \
--conf 'spark.executor.cores=12' \
--conf 'spark.local.dir=/gpfs/share/home/1801111726/TMP' \
--create-output-bam-splitting-index false \
--create-output-variant-index false
[error]
[August 14, 2019 at 3:28:11 AM CST] org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 43.62 minutes.
Runtime.totalMemory()=209715200
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:274)
at htsjdk.samtools.CachingBAMFileIndex.getQueryResults(CachingBAMFileIndex.java:159)
at htsjdk.samtools.BAMIndexMerger.processIndex(BAMIndexMerger.java:43)
at htsjdk.samtools.BAMIndexMerger.processIndex(BAMIndexMerger.java:16)
at org.disq_bio.disq.impl.file.IndexFileMerger.mergeParts(IndexFileMerger.java:90)
at org.disq_bio.disq.impl.formats.bam.BamSink.save(BamSink.java:132)
at org.disq_bio.disq.HtsjdkReadsRddStorage.write(HtsjdkReadsRddStorage.java:225)
at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:155)
at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:120)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:361)
at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDuplicatesSpark.java:325)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:528)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
19/08/14 03:28:11 INFO ShutdownHookManager: Shutdown hook called
19/08/14 03:28:11 INFO ShutdownHookManager: Deleting directory /gpfs/share/home/1801111726/TMP/spark-928668db-3104-4366-9942-f1bd4e17215d
Using GATK jar /gpfs/share/home/1801111726/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gpfs/share/home/1801111726/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar MarkDuplicatesSpark -I /gpfs/share/home/1801111726/sequencing_file/mapped_bam/breast_tissue+ptc_gatk_bundle/TN190626063-TMB.bam -O /gpfs/share/home/1801111726/sequencing_file/mapped_bam/breast_tissue+ptc_gatk_bundle/TN190626063-TMB_dedup_sort.bam --tmp-dir /gpfs/share/home/1801111726/TMP
/var/spool/slurmd/job141026/slurm_script: line 26: --conf: command not found
/var/spool/slurmd/job141026/slurm_script: line 27: --conf: command not found
Post edited by adam_di on

Best Answers

Answers

  • adam_diadam_di PKUMember
    Runtime.totalMemory()=209715200
  • adam_diadam_di PKUMember
    Thank you @bshifaw
    I tried, and updated the gatk to v4.1.3.0. But this time it got a different error in the begining. Is it due to the java version is not correct?
    [error message]
    Runtime.totalMemory()=2105540608
    java.lang.IllegalArgumentException: Unsupported class file major version 55
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
    at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
    at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
    at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
    at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
    at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:307)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:306)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:306)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2100)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
    at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:309)
    at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:171)
    at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:151)
    at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
    at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
    at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:936)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.sortUsingElementsAsKeys(SparkUtils.java:164)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.sortReadsAccordingToHeader(SparkUtils.java:142)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.querynameSortReadsIfNecessary(SparkUtils.java:293)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.mark(MarkDuplicatesSpark.java:205)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.mark(MarkDuplicatesSpark.java:269)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDuplicatesSpark.java:353)
    at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:533)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    19/08/14 21:35:23 INFO ShutdownHookManager: Shutdown hook called
    19/08/14 21:35:23 INFO ShutdownHookManager: Deleting directory /gpfs/share/home/1801111726/TMP/spark-a662738a-9551-4447-82e2-87fd5155814c
    Using GATK jar /gpfs/share/home/1801111726/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx42g -jar /gpfs/share/home/1801111726/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar MarkDuplicatesSpark -I /gpfs/share/home/1801111726/sequencing_file/mapped_bam/breast_tissue+ptc_gatk_bundle/TN190626063-TMB.bam -O /gpfs/share/home/1801111726/sequencing_file/mapped_bam/breast_tissue+ptc_gatk_bundle/TN190626063-TMB_dedup_sort.bam --conf spark.local.dir=/gpfs/share/home/1801111726/TMP
Sign In or Register to comment.