Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenomicsDBImport java process is using excessive vmem

afarlowafarlow melbourneMember

I'm running GenomicsDBImport on 900 x ~4Mb intervals with 1200 samples (I know this is below the 2.5:1 intervals to samples recommendation, but I'm limited to 1000 jobs in the queue). I'm working on a Linux HPC with PBSpro and give each job ncpus=1 -l mem=4GB and then attempt to limit Xmx -Xms using mem=$(echo "$PBS_VMEM * 0.4" | bc | cut -d"." -f1). Command is:

export TILEDB_DISABLE_FILE_LOCKING=1
gatk --java-options "-Xmx$mem -Xms$mem -Djava.io.tmpdir=$PBS_JOBFS -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" GenomicsDBImport \
    --sample-name-map $PBS_JOBFS/input.short.GVCF.list \
    --batch-size 50 \
    --consolidate TRUE \
    --reader-threads 5 \
    -ip 500 \
    --genomicsdb-workspace-path $PBS_JOBFS/${shortInt}.DBI \
    --intervals ${shortInt} 

However, about 50% of jobs abort in the final batch after java attempts to use about 2x available vmen.

Job 1385103.r-man2 has exceeded memory allocation on node r626
Process "python", pid 1205, rss 4743168, vmem 22040576
Process "java", pid 1206, rss 4291076096, vmem 9346064384
Process "bash", pid 18974, rss 1486848, vmem 9687040
Process "1385103.r-man2.", pid 19018, rss 1630208, vmem 9740288

Some of the failed intervals overlap repeats, but some do not.

Using GATK jar /short/XXX/software/gatk4/4.1.2.0/gatk-package-4.1.2.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx2576980377 -Xms2576980377 -Djava.io.tmpdir=/jobfs/local/1385103.r-man2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /short/XXX/software/gatk4/4.1.2.0/gatk-package-4.1.2.0-local.jar GenomicsDBImport --sample-name-map /jobfs/local/1385103.r-man2/input.short.GVCF.list --batch-size 50 --consolidate TRUE --reader-threads 5 -ip 500 --genomicsdb-workspace-path /jobfs/local/1385103.r-man2/chr4:49204507-51418580.DBI --intervals chr4:49204507-51418580
21:33:33.539 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/short/XXX/software/gatk4/4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 21, 2019 9:33:36 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
21:33:36.938 INFO  GenomicsDBImport - ------------------------------------------------------------
21:33:36.939 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.2.0
21:33:36.939 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
21:33:36.939 INFO  GenomicsDBImport - Executing as [email protected] on Linux v3.10.0-957.21.3.el6.x86_64 amd64
21:33:36.939 INFO  GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_60-b27
21:33:36.939 INFO  GenomicsDBImport - Start Date/Time: August 21, 2019 9:33:33 PM AEST
21:33:36.939 INFO  GenomicsDBImport - ------------------------------------------------------------
21:33:36.939 INFO  GenomicsDBImport - ------------------------------------------------------------
21:33:36.940 INFO  GenomicsDBImport - HTSJDK Version: 2.19.0
21:33:36.940 INFO  GenomicsDBImport - Picard Version: 2.19.0
21:33:36.940 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
21:33:36.940 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:33:36.940 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:33:36.940 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:33:36.940 INFO  GenomicsDBImport - Deflater: IntelDeflater
21:33:36.940 INFO  GenomicsDBImport - Inflater: IntelInflater
21:33:36.940 INFO  GenomicsDBImport - GCS max retries/reopens: 20
21:33:36.941 INFO  GenomicsDBImport - Requester pays: disabled
21:33:36.941 INFO  GenomicsDBImport - Initializing engine
21:33:37.886 INFO  IntervalArgumentCollection - Processing 2215074 bp from intervals
21:33:37.892 INFO  GenomicsDBImport - Done initializing engine
21:33:38.653 INFO  GenomicsDBImport - Vid Map JSON file will be written to /jobfs/local/1385103.r-man2/chr4:49204507-51418580.DBI/vidmap.json
21:33:38.653 INFO  GenomicsDBImport - Callset Map JSON file will be written to /jobfs/local/1385103.r-man2/chr4:49204507-51418580.DBI/callset.json
21:33:38.653 INFO  GenomicsDBImport - Complete VCF Header will be written to /jobfs/local/1385103.r-man2/chr4:49204507-51418580.DBI/vcfheader.vcf
21:33:38.653 INFO  GenomicsDBImport - Importing to array - /jobfs/local/1385103.r-man2/chr4:49204507-51418580.DBI/genomicsdb_array
21:33:38.662 INFO  ProgressMeter - Starting traversal
21:33:38.662 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
21:33:39.706 INFO  GenomicsDBImport - Starting batch input file preload
21:33:42.244 INFO  GenomicsDBImport - Finished batch preload
21:33:42.244 INFO  GenomicsDBImport - Importing batch 1 with 50 samples
21:35:31.079 INFO  ProgressMeter -        chr4:49204007              1.9                     1              0.5
21:35:31.079 INFO  GenomicsDBImport - Done importing batch 1/24
21:35:31.086 INFO  GenomicsDBImport - Starting batch input file preload
21:35:32.206 INFO  GenomicsDBImport - Finished batch preload
21:35:32.206 INFO  GenomicsDBImport - Importing batch 2 with 50 samples
21:37:00.494 INFO  ProgressMeter -        chr4:49204007              3.4                     2              0.6
21:37:00.498 INFO  GenomicsDBImport - Done importing batch 2/24
21:37:00.499 INFO  GenomicsDBImport - Starting batch input file preload
21:37:01.474 INFO  GenomicsDBImport - Finished batch preload
21:37:01.474 INFO  GenomicsDBImport - Importing batch 3 with 50 samples
21:38:23.254 INFO  ProgressMeter -        chr4:49204007              4.7                     3              0.6
21:38:23.256 INFO  GenomicsDBImport - Done importing batch 3/24

...

22:20:10.721 INFO  GenomicsDBImport - Importing batch 23 with 50 samples
22:22:36.575 INFO  ProgressMeter -        chr4:49204007             49.0                    23              0.5
22:22:36.575 INFO  GenomicsDBImport - Done importing batch 23/24
22:22:36.576 INFO  GenomicsDBImport - Starting batch input file preload
22:22:37.741 INFO  GenomicsDBImport - Finished batch preload
22:22:37.742 INFO  GenomicsDBImport - Importing batch 24 with 50 samples
Sign In or Register to comment.