If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenomicsDBImport on very large WES cohort : java.lang.OutOfMemoryError

emixaMemixaM FranceMember


I am running GenomicsDBImport on a very large WES cohort (more than 27k samples).
I did a run with 1.000 samples and it has worked fine, now it just initialize engine and shut it down, leaving just the out of memory error. I did try raising memory, or specifying a small-ish interval as a test (for the 1000 WES, I used whole chromosomes). Still the same error.

The script :

gatk GenomicsDBImport \
--java-options "-Xmx64g -Xms64g" \
--variant gvcf.list \
-L chr1:750000-1000000 \
--batch-size 50 \
--tmp-dir=/tmp \
--genomicsdb-workspace-path wholeCohort.chr1:750000-1000000

The log :

16:44:37.783 INFO  NativeLibraryLoader - Loading from jar:file:/gatk4-!/com/intel/gkl/native/
May 06, 2019 4:44:39 PM runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:44:39.811 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.811 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.2.0
16:44:39.811 INFO  GenomicsDBImport - For support and documentation go to
16:44:39.812 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
16:44:39.812 INFO  GenomicsDBImport - Start Date/Time: May 6, 2019 4:44:37 PM CEST16:44:39.812 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.812 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Version: 2.19.0
16:44:39.813 INFO  GenomicsDBImport - Picard Version: 2.19.0
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:44:39.813 INFO  GenomicsDBImport - Deflater: IntelDeflater
16:44:39.813 INFO  GenomicsDBImport - Inflater: IntelInflater
16:44:39.813 INFO  GenomicsDBImport - GCS max retries/reopens: 20
16:44:39.813 INFO  GenomicsDBImport - Requester pays: disabled
16:44:39.814 INFO  GenomicsDBImport - Initializing engine
18:59:35.663 INFO  GenomicsDBImport - Shutting down engine
[May 6, 2019 6:59:35 PM CEST] done. Elapsed time: 134.97 minutes.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.LinkedHashMap.newNode(
        at java.util.HashMap.putVal(
        at java.util.HashMap.put(
        at htsjdk.variant.vcf.VCF4Parser.parseLine(
        at htsjdk.variant.vcf.VCFHeaderLineTranslator.parseLine(
        at htsjdk.variant.vcf.VCFSimpleHeaderLine.<init>(
        at htsjdk.variant.vcf.VCFContigHeaderLine.<init>(
        at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(
        at htsjdk.variant.vcf.VCFCodec.readActualHeader(
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(
        at htsjdk.tribble.TabixFeatureReader.readHeader(
        at htsjdk.tribble.TabixFeatureReader.<init>(
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(        at org.broadinstitute.hellbender.Main.mainEntry(
        at org.broadinstitute.hellbender.Main.main(
Using GATK jar /gatk4-
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2
-Xmx64g -Xms64g -jar /gatk4- GenomicsDBImport --variant gvcf.list -L chr1:750000-1000000 --batch-size 50 --tmp-dir=/tmp --genomicsdb-workspace-path wholeCohort.chr1:750000-1000000

Thanks for any idea how to process this!



Best Answer


Sign In or Register to comment.