Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

HaplotypeCaller error "java.lang.OutOfMemoryError" even when short Intervals are specified

KUBNKUBN Member
edited May 3 in Ask the GATK team
Hello GATK community,

Running the HaplotypeCaller using only the required arguments gives me Java OutOfMemoryError. The same happens with Mutect2. I run:

./gatk Mutect2 -R /hg19.fa -I /xx.bam -O /output.vcf.gz

I am running the command with --java-options "-Xmx4g" and specifying the interval of 10kb (-L chr6:31130114-31140470), is it really possible or likely that I don't have enough memory to run this analysis? The laptop I am using for this analysis is a MacBook Pro with 8 GB 1867 MHz LPDDR3.

The full error code is: "Exception in thread "main" java.lang.OutOfMemoryError: Java heap space".

Any insights would be greatly appreciated.

Best wishes,
Nada

Answers

  • bshifawbshifaw Member, Broadie, Moderator admin

    Hi @KUBN

    When java heap space message pops up its best to validate the input file to make sure its inline with gatk criteria using ValidateSamFile, and increase the memory. Using intervals also helps and it sounds like you've incorporated intervals in the command, though some parts of the chromosome might be trickier to process then others.
    The contents of the bam file can have an effect the amount of memory HaplotypeCaller requires, what is the background of your input sample?
    What is the actual HaplotypeCaller command being used?

  • KUBNKUBN Member
    Hi @bshifaw

    Thanks a lot for your response.

    Can I also add that I tried running with --java-options "-Xmx4g" and specifying the interval of 10kb (-L chr6:31130114-31140470), is it really possible and likely that I don't have enough memory to run the analysis? Shouldn't specifying the interval greatly speed up the analysis? My machine has got 8GB of physical memory and I made sure to free up 5GB ahead of the analysis. Still, no luck running it, I always get the same error code, no matter the -Xmx value I set. The bam file is not a large one and I ran ValidateSamFile on it - no errors found.

    I tried --help, --version and --CountReads commands and all work fine. --CountAlleles however gives me the same OutOfMemoryError and so does --Mutect2.

    The GATK version is 4.0.8.1-4, Java 1.8.0_181 and the entire command and log is:

    ```
    Nadas-MacBook-Pro:gatk nadakubikova$ ./gatk HaplotypeCaller -R /Users/nadakubikova/Desktop/HumanGenome/hg19.fa -I /Users/nadakubikova/Downloads/wetransfer-7475fd/C13K_S21.bam -O /Users/nadakubikova/Desktop/HumanGenome/output.g.vcf.gz --java-options "-Xmx4g" -L chr6:31130114-31140470
    Using GATK jar /Users/nadakubikova/Fish/basic/gatk/build/libs/gatk-package-4.0.8.1-4-g1dbd042-SNAPSHOT-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /Users/nadakubikova/Fish/basic/gatk/build/libs/gatk-package-4.0.8.1-4-g1dbd042-SNAPSHOT-local.jar HaplotypeCaller -R /Users/nadakubikova/Desktop/HumanGenome/hg19.fa -I /Users/nadakubikova/Downloads/wetransfer-7475fd/C13K_S21.bam -O /Users/nadakubikova/Desktop/HumanGenome/output.g.vcf.gz -L chr6:31130114-31140470
    14:56:17.074 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/nadakubikova/Fish/basic/gatk/build/libs/gatk-package-4.0.8.1-4-g1dbd042-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
    14:56:17.249 INFO HaplotypeCaller - ------------------------------------------------------------
    14:56:17.250 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.8.1-4-g1dbd042-SNAPSHOT
    14:56:17.250 INFO HaplotypeCaller - For support and documentation go to
    14:56:17.250 INFO HaplotypeCaller - Executing as [email protected] on Mac OS X v10.14.4 x86_64
    14:56:17.250 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_181-b13
    14:56:17.250 INFO HaplotypeCaller - Start Date/Time: 04 May 2019 14:56:17 BST
    14:56:17.250 INFO HaplotypeCaller - ------------------------------------------------------------
    14:56:17.250 INFO HaplotypeCaller - ------------------------------------------------------------
    14:56:17.251 INFO HaplotypeCaller - HTSJDK Version: 2.16.0
    14:56:17.251 INFO HaplotypeCaller - Picard Version: 2.18.7
    14:56:17.251 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    14:56:17.251 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    14:56:17.251 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    14:56:17.251 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    14:56:17.251 INFO HaplotypeCaller - Deflater: IntelDeflater
    14:56:17.251 INFO HaplotypeCaller - Inflater: IntelInflater
    14:56:17.251 INFO HaplotypeCaller - GCS max retries/reopens: 20
    14:56:17.251 INFO HaplotypeCaller - Using google-cloud-java fork github.com/broadinstitute/google-cloud-java/releases/tag/0.20.5-alpha-GCS-RETRY-FIX
    14:56:17.251 INFO HaplotypeCaller - Initializing engine
    15:00:21.031 INFO HaplotypeCaller - Shutting down engine
    [04 May 2019 15:00:21 BST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 4.07 minutes.
    Runtime.totalMemory()=3817865216
    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.(SAMTextHeaderCodec.java:287)
    at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:95)
    at htsjdk.samtools.reference.ReferenceSequenceFileFactory.loadDictionary(ReferenceSequenceFileFactory.java:232)
    at htsjdk.samtools.reference.AbstractFastaSequenceFile.(AbstractFastaSequenceFile.java:68)
    at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.(AbstractIndexedFastaSequenceFile.java:60)
    at htsjdk.samtools.reference.IndexedFastaSequenceFile.(IndexedFastaSequenceFile.java:80)
    at htsjdk.samtools.reference.IndexedFastaSequenceFile.(IndexedFastaSequenceFile.java:98)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.(CachingIndexedFastaSequenceFile.java:98)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.checkAndCreate(CachingIndexedFastaSequenceFile.java:205)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.checkAndCreate(CachingIndexedFastaSequenceFile.java:183)
    at org.broadinstitute.hellbender.engine.ReferenceFileSource.(ReferenceFileSource.java:37)
    at org.broadinstitute.hellbender.engine.ReferenceDataSource.of(ReferenceDataSource.java:27)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeReference(GATKTool.java:364)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:634)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:156)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:182)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:201)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Nadas-MacBook-Pro:gatk nadakubikova$
    ```

    Any insights would be greatly appreciated.
  • bshifawbshifaw Member, Broadie, Moderator admin

    Mind trying the command with the latest version of the tool gatk4.1.2.0

  • KUBNKUBN Member
    @bshifaw

    gatk4.1.2.0 gives me the same error codes I am afraid :-(.

    Should I just give up using my local computer for this analysis you think?
  • bshifawbshifaw Member, Broadie, Moderator admin

    Yes, if you have access to a system with larger resources definitely give it a go.
    Looks like the error message changed from "Java heap space" to "GC overhead limit exceeded" , was there anything different done to get the last logs?

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    @KUBN
    I've also noticed that you seem to be using HG19 reference with an interval that starts with "chr".

Sign In or Register to comment.