This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Spark-related heap error in GATK4 PathSeqBuildKmers
I am encountering a Java heap space error when trying to generate the host k-mer library from the PathSeq resource bundle that I am at a bit of a loss to understand and troubleshoot. The specific error appears to occur after the tool has actually completed its run:
org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqBuildKmers done. Elapsed time: 12.53 minutes. Runtime.totalMemory()=68719476736 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.broadinstitute.hellbender.tools.spark.utils.LongHopscotchSet.<init>(LongHopscotchSet.java:59) at org.broadinstitute.hellbender.tools.spark.utils.LargeLongHopscotchSet.<init>(LargeLongHopscotchSet.java:42) at org.broadinstitute.hellbender.tools.spark.pathseq.PSKmerUtils.longArrayCollectionToSet(PSKmerUtils.java:82) at org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqBuildKmers.doWork(PathSeqBuildKmers.java:171) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)
however, no host.hss file is generated. I am invoking the program with more than enough heap space:
./gatk/gatk --java-options "-Xms72G -Xmx72G" PathSeqBuildKmersSpark --reference pathseq_host.fa -O host.hss
I've watched top during a run and can confirm that the process never exceeds around 69GB of memory usage. I've tried to play with the Spark options in case the heap space issue is occurring thre but trying to set "--spark-master local[*]" will throw a: "A USER ERROR has occurred: spark-master is not a recognized option" error whether or not I include --spark-runner LOCAL, so I'm not sure how I'm supposed to configure Spark given that issue. My GATK version is 220.127.116.11-local and I'm using OpenJDK 1.8.0_131 as the JVM. Thanks for any help you can provide and please let me know if you need any additional information on my end.
Hollis Wright, PhD
Assistant Staff Scientist
Oregon Health And Science University