We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GC overhead java error on high shard count with cromwell for LearnReadOrientationModel

SHollizeckSHollizeck Member
edited October 2019 in Ask the GATK team

Hey,

I am back with another issue, when running multi sample somatic variant calling with mutect2. (10 tumor samples WGS at 130x)
Currently I have a workflow definition in cromwell splitting the calling regions into 7 Million base regions, which leads to a scatter across 515 shards. (the size of the region is so the region can be run within 24h)

workflow is basically:

  • scatter mutect2
  • concat vcfs
  • combine stats
  • run pileup
  • estimate contamination
  • learn read orientation model

And this is where it fails.
I have already allowed for 32Gb Heap size and I will try how much I have to request to make it work.
I understand that my usecase might not be a common issue, but this could affect other people as well.

Runtime.totalMemory()=30542397440
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1875)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at htsjdk.samtools.util.FormatUtil.parseDouble(FormatUtil.java:141)
        at htsjdk.samtools.metrics.MetricsFile.read(MetricsFile.java:434)
        at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.readMetricsFile(LearnReadOrientationModel.java:296)
        at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.lambda$doWork$7(LearnReadOrientationModel.java:96)
        at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel$$Lambda$52/825496893.apply(Unknown Source)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.doWork(LearnReadOrientationModel.java:97)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
Using GATK jar /gatk/gatk-package-4.1.2.0-local.jar

As always, thanks for your help

EDIT:
I have tried a few more sets and it seems in my case the Heap space required is 62Gb, which is quite substantial.
So my problem is kinda solved, but I think it would be worth looking into the code for possible optimizations.

Post edited by SHollizeck on

Answers

Sign In or Register to comment.