Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenomicsDBImport 4.1.2.0 on very large WES cohort : java.lang.OutOfMemoryError

emixaMemixaM FranceMember

Hello,

I am running GenomicsDBImport on a very large WES cohort (more than 27k samples).
I did a run with 1.000 samples and it has worked fine, now it just initialize engine and shut it down, leaving just the out of memory error. I did try raising memory, or specifying a small-ish interval as a test (for the 1000 WES, I used whole chromosomes). Still the same error.

The script :

gatk GenomicsDBImport \
--java-options "-Xmx64g -Xms64g -Djava.io.tmpdir=/tmp" \
--variant gvcf.list \
-L chr1:750000-1000000 \
--batch-size 50 \
--tmp-dir=/tmp \
--genomicsdb-workspace-path wholeCohort.chr1:750000-1000000

The log :

16:44:37.783 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 06, 2019 4:44:39 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:44:39.811 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.811 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.2.0
16:44:39.811 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
16:44:39.812 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
16:44:39.812 INFO  GenomicsDBImport - Start Date/Time: May 6, 2019 4:44:37 PM CEST16:44:39.812 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.812 INFO  GenomicsDBImport - ------------------------------------------------------------
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Version: 2.19.0
16:44:39.813 INFO  GenomicsDBImport - Picard Version: 2.19.0
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:44:39.813 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:44:39.813 INFO  GenomicsDBImport - Deflater: IntelDeflater
16:44:39.813 INFO  GenomicsDBImport - Inflater: IntelInflater
16:44:39.813 INFO  GenomicsDBImport - GCS max retries/reopens: 20
16:44:39.813 INFO  GenomicsDBImport - Requester pays: disabled
16:44:39.814 INFO  GenomicsDBImport - Initializing engine
18:59:35.663 INFO  GenomicsDBImport - Shutting down engine
[May 6, 2019 6:59:35 PM CEST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 134.97 minutes.
Runtime.totalMemory()=66190835712
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.LinkedHashMap.newNode(LinkedHashMap.java:256)
        at java.util.HashMap.putVal(HashMap.java:631)
        at java.util.HashMap.put(HashMap.java:612)
        at htsjdk.variant.vcf.VCF4Parser.parseLine(VCFHeaderLineTranslator.java:146)
        at htsjdk.variant.vcf.VCFHeaderLineTranslator.parseLine(VCFHeaderLineTranslator.java:56)
        at htsjdk.variant.vcf.VCFSimpleHeaderLine.<init>(VCFSimpleHeaderLine.java:82)
        at htsjdk.variant.vcf.VCFContigHeaderLine.<init>(VCFContigHeaderLine.java:53)
        at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:212)
        at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
        at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:95)
        at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:116)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:687)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:411)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:369)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:347)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
Using GATK jar /gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2
-Xmx64g -Xms64g -Djava.io.tmpdir=/tmp -jar /gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar GenomicsDBImport --variant gvcf.list -L chr1:750000-1000000 --batch-size 50 --tmp-dir=/tmp --genomicsdb-workspace-path wholeCohort.chr1:750000-1000000

Thanks for any idea how to process this!

Cheers,

--Maxime

Best Answer

Answers

Sign In or Register to comment.