We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GenomicsDBImport 4.1.2.0 on very large WES cohort : java.lang.OutOfMemoryError

emixaMemixaM FranceMember

Hello,

I am running GenomicsDBImport on a very large WES cohort (more than 27k samples).
I did a run with 1.000 samples and it has worked fine, now it just initialize engine and shut it down, leaving just the out of memory error. I did try raising memory, or specifying a small-ish interval as a test (for the 1000 WES, I used whole chromosomes). Still the same error.

The script :

gatk GenomicsDBImport \
--java-options "-Xmx64g -Xms64g -Djava.io.tmpdir=/tmp" \
--variant gvcf.list \
-L chr1:750000-1000000 \
--batch-size 50 \
--tmp-dir=/tmp \
--genomicsdb-workspace-path wholeCohort.chr1:750000-1000000

The log :

16:44:37.783 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 06, 2019 4:44:39 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:44:39.811 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.811 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.2.0
16:44:39.811 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
16:44:39.812 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
16:44:39.812 INFO  GenomicsDBImport - Start Date/Time: May 6, 2019 4:44:37 PM CEST16:44:39.812 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.812 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Version: 2.19.0
16:44:39.813 INFO  GenomicsDBImport - Picard Version: 2.19.0
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:44:39.813 INFO  GenomicsDBImport - Deflater: IntelDeflater
16:44:39.813 INFO  GenomicsDBImport - Inflater: IntelInflater
16:44:39.813 INFO  GenomicsDBImport - GCS max retries/reopens: 20
16:44:39.813 INFO  GenomicsDBImport - Requester pays: disabled
16:44:39.814 INFO  GenomicsDBImport - Initializing engine
18:59:35.663 INFO  GenomicsDBImport - Shutting down engine
[May 6, 2019 6:59:35 PM CEST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 134.97 minutes.
Runtime.totalMemory()=66190835712
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.LinkedHashMap.newNode(LinkedHashMap.java:256)
        at java.util.HashMap.putVal(HashMap.java:631)
        at java.util.HashMap.put(HashMap.java:612)
        at htsjdk.variant.vcf.VCF4Parser.parseLine(VCFHeaderLineTranslator.java:146)
        at htsjdk.variant.vcf.VCFHeaderLineTranslator.parseLine(VCFHeaderLineTranslator.java:56)
        at htsjdk.variant.vcf.VCFSimpleHeaderLine.<init>(VCFSimpleHeaderLine.java:82)
        at htsjdk.variant.vcf.VCFContigHeaderLine.<init>(VCFContigHeaderLine.java:53)
        at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:212)
        at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
        at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:95)
        at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:116)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:687)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:411)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:369)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:347)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
Using GATK jar /gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2
-Xmx64g -Xms64g -Djava.io.tmpdir=/tmp -jar /gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar GenomicsDBImport --variant gvcf.list -L chr1:750000-1000000 --batch-size 50 --tmp-dir=/tmp --genomicsdb-workspace-path wholeCohort.chr1:750000-1000000

Thanks for any idea how to process this!

Cheers,

--Maxime

Best Answer

Answers

Sign In or Register to comment.