GenotypeGVCFs (V4.0) memory error

Hi all,

I recently upgraded to GATK 4, and have been having problems with GenotypeGVCFs. I noticed that this problem only occurs as I introduced more samples. I also noticed that other uses have been reporting memory errors. I commented on issue #4454 to bring up I've been having these issues. GATK 3.8 finishes the job in 36 seconds based on a list of GVCFs. But when I CombineGVCFs -> GenotypeGVCFs on the same 108 samples, it takes over three hours, spits out a severely truncated VCF, and produces many memory errors. At the above issue link, I attached several log error files for what happened when I progressively increased the number of samples I supplied. For redundancy, pasted here is the bottom of an error log file that shows the memory issue I'm having:

14:13:53.736 INFO  GenotypeGVCFs - Shutting down engine
[March 6, 2018 2:13:53 PST PM] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 195.26 minutes.
Runtime.totalMemory()=28631367680
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Spliterators.spliterator(Spliterators.java:240)
        at java.util.Arrays.spliterator(Arrays.java:4911)
        at java.util.Arrays.stream(Arrays.java:5053)
        at java.util.Arrays.stream(Arrays.java:5035)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.isValidConformation(GeneralPloidyExactAFCalculator.java:315)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.computeLofK(GeneralPloidyExactAFCalculator.java:280)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.calculateACConformationAndUpdateQueue(GeneralPloidyExactAFCalculator.java:187)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.fastCombineMultiallelicPool(GeneralPloidyExactAFCalculator.java:148)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.combineSinglePools(GeneralPloidyExactAFCalculator.java:112)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.computeLog10PNonRef(GeneralPloidyExactAFCalculator.java:25)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.AFCalculator.getLog10PNonRef(AFCalculator.java:33)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:255)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:210)
        at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.calculateGenotypes(GenotypeGVCFs.java:265)
        at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:221)
        at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:200)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase$$Lambda$89/1692317071.accept(Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)

Let me know if this is a known issue, or if it has already been fixed and I should try a different build, or if you have any questions.

Thank you!

Best Answer

  • Accepted Answer

    UPDATE:

    A collaborator just came in and gave me the solution. He said that he also had this problem when calling haploids. It seems that he only has this issue for haploid data. He said that when he added the parameter "--new-qual" to his GenotypeGVCFs command, that solved the memory issue. It did for me, just now, as well. Gatk 4 GenotypeGVCFs executed perfectly in less than half a minute. Might also be worth noting that using Gatk 3.8 genotyping, I got 2750 variants across all my samples, and with Gatk 4 genotyping, I got 2790 variants.

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    How much heap space do you provide to your java vm instance for GATK4.0?

    Are you using --java-options?

  • I am running on default memory options. In the log file, GenotypeGVCFs said it ended up using 28631367680 bytes. It also doesn't error out - the program says that it ran completely for the elapsed time. It just doesn't then give the "xxx total variants processed..." and other messages at the bottom. That is replaced by the exception/memory error I pasted above.

    How much heap space should I try to give it? I can try that and run it again.

  • neato_nickneato_nick Member
    Accepted Answer

    UPDATE:

    A collaborator just came in and gave me the solution. He said that he also had this problem when calling haploids. It seems that he only has this issue for haploid data. He said that when he added the parameter "--new-qual" to his GenotypeGVCFs command, that solved the memory issue. It did for me, just now, as well. Gatk 4 GenotypeGVCFs executed perfectly in less than half a minute. Might also be worth noting that using Gatk 3.8 genotyping, I got 2750 variants across all my samples, and with Gatk 4 genotyping, I got 2790 variants.

Sign In or Register to comment.