Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

GenotypeGVCFs (V4.0) memory error

Hi all,

I recently upgraded to GATK 4, and have been having problems with GenotypeGVCFs. I noticed that this problem only occurs as I introduced more samples. I also noticed that other uses have been reporting memory errors. I commented on issue #4454 to bring up I've been having these issues. GATK 3.8 finishes the job in 36 seconds based on a list of GVCFs. But when I CombineGVCFs -> GenotypeGVCFs on the same 108 samples, it takes over three hours, spits out a severely truncated VCF, and produces many memory errors. At the above issue link, I attached several log error files for what happened when I progressively increased the number of samples I supplied. For redundancy, pasted here is the bottom of an error log file that shows the memory issue I'm having:

14:13:53.736 INFO  GenotypeGVCFs - Shutting down engine
[March 6, 2018 2:13:53 PST PM] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 195.26 minutes.
Runtime.totalMemory()=28631367680
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Spliterators.spliterator(Spliterators.java:240)
        at java.util.Arrays.spliterator(Arrays.java:4911)
        at java.util.Arrays.stream(Arrays.java:5053)
        at java.util.Arrays.stream(Arrays.java:5035)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.isValidConformation(GeneralPloidyExactAFCalculator.java:315)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.computeLofK(GeneralPloidyExactAFCalculator.java:280)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.calculateACConformationAndUpdateQueue(GeneralPloidyExactAFCalculator.java:187)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.fastCombineMultiallelicPool(GeneralPloidyExactAFCalculator.java:148)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.combineSinglePools(GeneralPloidyExactAFCalculator.java:112)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.GeneralPloidyExactAFCalculator.computeLog10PNonRef(GeneralPloidyExactAFCalculator.java:25)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.AFCalculator.getLog10PNonRef(AFCalculator.java:33)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:255)
        at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:210)
        at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.calculateGenotypes(GenotypeGVCFs.java:265)
        at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:221)
        at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:200)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase$$Lambda$89/1692317071.accept(Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)

Let me know if this is a known issue, or if it has already been fixed and I should try a different build, or if you have any questions.

Thank you!

Best Answer

  • Accepted Answer

    UPDATE:

    A collaborator just came in and gave me the solution. He said that he also had this problem when calling haploids. It seems that he only has this issue for haploid data. He said that when he added the parameter "--new-qual" to his GenotypeGVCFs command, that solved the memory issue. It did for me, just now, as well. Gatk 4 GenotypeGVCFs executed perfectly in less than half a minute. Might also be worth noting that using Gatk 3.8 genotyping, I got 2750 variants across all my samples, and with Gatk 4 genotyping, I got 2790 variants.

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    How much heap space do you provide to your java vm instance for GATK4.0?

    Are you using --java-options?

  • I am running on default memory options. In the log file, GenotypeGVCFs said it ended up using 28631367680 bytes. It also doesn't error out - the program says that it ran completely for the elapsed time. It just doesn't then give the "xxx total variants processed..." and other messages at the bottom. That is replaced by the exception/memory error I pasted above.

    How much heap space should I try to give it? I can try that and run it again.

  • neato_nickneato_nick Member
    Accepted Answer

    UPDATE:

    A collaborator just came in and gave me the solution. He said that he also had this problem when calling haploids. It seems that he only has this issue for haploid data. He said that when he added the parameter "--new-qual" to his GenotypeGVCFs command, that solved the memory issue. It did for me, just now, as well. Gatk 4 GenotypeGVCFs executed perfectly in less than half a minute. Might also be worth noting that using Gatk 3.8 genotyping, I got 2750 variants across all my samples, and with Gatk 4 genotyping, I got 2790 variants.

Sign In or Register to comment.