Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Spark-related heap error in GATK4 PathSeqBuildKmers
I am encountering a Java heap space error when trying to generate the host k-mer library from the PathSeq resource bundle that I am at a bit of a loss to understand and troubleshoot. The specific error appears to occur after the tool has actually completed its run:
org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqBuildKmers done. Elapsed time: 12.53 minutes. Runtime.totalMemory()=68719476736 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.broadinstitute.hellbender.tools.spark.utils.LongHopscotchSet.<init>(LongHopscotchSet.java:59) at org.broadinstitute.hellbender.tools.spark.utils.LargeLongHopscotchSet.<init>(LargeLongHopscotchSet.java:42) at org.broadinstitute.hellbender.tools.spark.pathseq.PSKmerUtils.longArrayCollectionToSet(PSKmerUtils.java:82) at org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqBuildKmers.doWork(PathSeqBuildKmers.java:171) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)
however, no host.hss file is generated. I am invoking the program with more than enough heap space:
./gatk/gatk --java-options "-Xms72G -Xmx72G" PathSeqBuildKmersSpark --reference pathseq_host.fa -O host.hss
I've watched top during a run and can confirm that the process never exceeds around 69GB of memory usage. I've tried to play with the Spark options in case the heap space issue is occurring thre but trying to set "--spark-master local[*]" will throw a: "A USER ERROR has occurred: spark-master is not a recognized option" error whether or not I include --spark-runner LOCAL, so I'm not sure how I'm supposed to configure Spark given that issue. My GATK version is 188.8.131.52-local and I'm using OpenJDK 1.8.0_131 as the JVM. Thanks for any help you can provide and please let me know if you need any additional information on my end.
Hollis Wright, PhD
Assistant Staff Scientist
Oregon Health And Science University