GenomicsDBImport on very large WES cohort : java.lang.OutOfMemoryError

emixaMemixaM FranceMember


I am running GenomicsDBImport on a very large WES cohort (more than 27k samples).
I did a run with 1.000 samples and it has worked fine, now it just initialize engine and shut it down, leaving just the out of memory error. I did try raising memory, or specifying a small-ish interval as a test (for the 1000 WES, I used whole chromosomes). Still the same error.

The script :

gatk GenomicsDBImport \
--java-options "-Xmx64g -Xms64g" \
--variant gvcf.list \
-L chr1:750000-1000000 \
--batch-size 50 \
--tmp-dir=/tmp \
--genomicsdb-workspace-path wholeCohort.chr1:750000-1000000

The log :

16:44:37.783 INFO  NativeLibraryLoader - Loading from jar:file:/gatk4-!/com/intel/gkl/native/
May 06, 2019 4:44:39 PM runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:44:39.811 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.811 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.2.0
16:44:39.811 INFO  GenomicsDBImport - For support and documentation go to
16:44:39.812 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
16:44:39.812 INFO  GenomicsDBImport - Start Date/Time: May 6, 2019 4:44:37 PM CEST16:44:39.812 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.812 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Version: 2.19.0
16:44:39.813 INFO  GenomicsDBImport - Picard Version: 2.19.0
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:44:39.813 INFO  GenomicsDBImport - Deflater: IntelDeflater
16:44:39.813 INFO  GenomicsDBImport - Inflater: IntelInflater
16:44:39.813 INFO  GenomicsDBImport - GCS max retries/reopens: 20
16:44:39.813 INFO  GenomicsDBImport - Requester pays: disabled
16:44:39.814 INFO  GenomicsDBImport - Initializing engine
18:59:35.663 INFO  GenomicsDBImport - Shutting down engine
[May 6, 2019 6:59:35 PM CEST] done. Elapsed time: 134.97 minutes.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.LinkedHashMap.newNode(
        at java.util.HashMap.putVal(
        at java.util.HashMap.put(
        at htsjdk.variant.vcf.VCF4Parser.parseLine(
        at htsjdk.variant.vcf.VCFHeaderLineTranslator.parseLine(
        at htsjdk.variant.vcf.VCFSimpleHeaderLine.<init>(
        at htsjdk.variant.vcf.VCFContigHeaderLine.<init>(
        at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(
        at htsjdk.variant.vcf.VCFCodec.readActualHeader(
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(
        at htsjdk.tribble.TabixFeatureReader.readHeader(
        at htsjdk.tribble.TabixFeatureReader.<init>(
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(        at org.broadinstitute.hellbender.Main.mainEntry(
        at org.broadinstitute.hellbender.Main.main(
Using GATK jar /gatk4-
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2
-Xmx64g -Xms64g -jar /gatk4- GenomicsDBImport --variant gvcf.list -L chr1:750000-1000000 --batch-size 50 --tmp-dir=/tmp --genomicsdb-workspace-path wholeCohort.chr1:750000-1000000

Thanks for any idea how to process this!



