Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MarkDuplicatesSpark got 'OutOfMemoryError: Java heap space '

adam_diadam_di PKUMember
edited August 14 in Ask the GATK team
I'm following GATK Best Practices (v4.1.2), and runing the data-preprocessing step using MarkDuplicatesSpark on an HPC cluster. The java version is 11.0.1 and the available RAM is 48GB. The data is a paired-end targeted panel DNA-seq, and is about 13.32GB. Four output files were generated and they are *_dedup_sorted.bam of 7.61GB, *_dedup_sorted.bam.bai of 0 byte, *_dedup_sorted.bam.sbi of 312KB, and a file folder *_dedup_sort.bam.parts containing a _SUCCESS file of 0 byte.
Here is the exact command and the error I got
[command]
gatk MarkDuplicatesSpark \
-I $bam_path/breast_tissue+ptc_gatk_bundle/${sample_name}.bam \
-O $bam_path/breast_tissue+ptc_gatk_bundle/${sample_name}_dedup_sort.bam \
--tmp-dir /gpfs/share/home/1801111726/TMP \
--conf 'spark.executor.cores=12' \
--conf 'spark.local.dir=/gpfs/share/home/1801111726/TMP' \
--create-output-bam-splitting-index false \
--create-output-variant-index false
[error]
[August 14, 2019 at 3:28:11 AM CST] org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 43.62 minutes.
Runtime.totalMemory()=209715200
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:274)
at htsjdk.samtools.CachingBAMFileIndex.getQueryResults(CachingBAMFileIndex.java:159)
at htsjdk.samtools.BAMIndexMerger.processIndex(BAMIndexMerger.java:43)
at htsjdk.samtools.BAMIndexMerger.processIndex(BAMIndexMerger.java:16)
at org.disq_bio.disq.impl.file.IndexFileMerger.mergeParts(IndexFileMerger.java:90)
at org.disq_bio.disq.impl.formats.bam.BamSink.save(BamSink.java:132)
at org.disq_bio.disq.HtsjdkReadsRddStorage.write(HtsjdkReadsRddStorage.java:225)
at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:155)
at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:120)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:361)
at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDuplicatesSpark.java:325)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:528)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
19/08/14 03:28:11 INFO ShutdownHookManager: Shutdown hook called
19/08/14 03:28:11 INFO ShutdownHookManager: Deleting directory /gpfs/share/home/1801111726/TMP/spark-928668db-3104-4366-9942-f1bd4e17215d
Using GATK jar /gpfs/share/home/1801111726/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gpfs/share/home/1801111726/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar MarkDuplicatesSpark -I /gpfs/share/home/1801111726/sequencing_file/mapped_bam/breast_tissue+ptc_gatk_bundle/TN190626063-TMB.bam -O /gpfs/share/home/1801111726/sequencing_file/mapped_bam/breast_tissue+ptc_gatk_bundle/TN190626063-TMB_dedup_sort.bam --tmp-dir /gpfs/share/home/1801111726/TMP
/var/spool/slurmd/job141026/slurm_script: line 26: --conf: command not found
/var/spool/slurmd/job141026/slurm_script: line 27: --conf: command not found
Post edited by adam_di on

Best Answers

Answers

  • adam_diadam_di PKUMember
    Runtime.totalMemory()=209715200
  • adam_diadam_di PKUMember
    Thank you @bshifaw
    I tried, and updated the gatk to v4.1.3.0. But this time it got a different error in the begining. Is it due to the java version is not correct?
    [error message]
    Runtime.totalMemory()=2105540608
    java.lang.IllegalArgumentException: Unsupported class file major version 55
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
    at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
    at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
    at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
    at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
    at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:307)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:306)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:306)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2100)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
    at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:309)
    at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:171)
    at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:151)
    at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
    at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
    at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:936)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.sortUsingElementsAsKeys(SparkUtils.java:164)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.sortReadsAccordingToHeader(SparkUtils.java:142)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.querynameSortReadsIfNecessary(SparkUtils.java:293)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.mark(MarkDuplicatesSpark.java:205)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.mark(MarkDuplicatesSpark.java:269)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDuplicatesSpark.java:353)
    at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:533)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    19/08/14 21:35:23 INFO ShutdownHookManager: Shutdown hook called
    19/08/14 21:35:23 INFO ShutdownHookManager: Deleting directory /gpfs/share/home/1801111726/TMP/spark-a662738a-9551-4447-82e2-87fd5155814c
    Using GATK jar /gpfs/share/home/1801111726/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx42g -jar /gpfs/share/home/1801111726/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar MarkDuplicatesSpark -I /gpfs/share/home/1801111726/sequencing_file/mapped_bam/breast_tissue+ptc_gatk_bundle/TN190626063-TMB.bam -O /gpfs/share/home/1801111726/sequencing_file/mapped_bam/breast_tissue+ptc_gatk_bundle/TN190626063-TMB_dedup_sort.bam --conf spark.local.dir=/gpfs/share/home/1801111726/TMP
Sign In or Register to comment.