Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Invalid gzip header in genomicsdbimport

NicoBxlNicoBxl Member
edited April 23 in Ask the GATK team

Hi,

I've a very strange bug when executing genomicsDBimport on 498 WES samples. To speed up the genomicsDBimport + GenotypeGVCFs part I divided my interval list in multiple single intervals executed in parallel on our local cluster.

Strangely it only happens to a subset of intervals, without no reason... An important thing to note it's that it always block in batch 8/10 !

Here's the comand :

gatk --java-options "-Xmx13g -Xms5g" \
 GenomicsDBImport \
 -R $ref_hg38 \
 --sample-name-map $list_samples \
 --genomicsdb-workspace-path $DB_WGS_hg38 \
 --reader-threads 1 --batch-size 50 \
 --overwrite-existing-genomicsdb-workspace \
 -L $tmpdir/interval/tmp_interval.bed

tmp_interval.bed contains one line with an interval e.g.

chr1    155746  155851

Here is the error I got. It stops during the batch import :

Using GATK jar /gpfsuser/home/users/n/r/test/tools/gatk-4.1.1.0/gatk-package-4.1.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx13g -Xms5g -jar /gpfsuser/home/users/n/r/test/tools/gatk-4.1.1.0/gatk-package-4.1.1.0-local.jar GenomicsDBImport -R /home/uni/lab/test/genomes/hg38/Homo_sapiens_assembly38.fasta --sample-name-map /home/uni/lab/test/projects/test/data/sample_list_WES_hg38_20190419.txt --genomicsdb-workspace-path /tmp/test/test_hg38_1/gatk_genomicsdbimport_workspace --reader-threads 1 --batch-size 50 --overwrite-existing-genomicsdb-workspace -L /tmp/test/test_hg38_1/interval/tmp_interval.bed19:46:59.791 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfsuser/home/users/n/r/test/tools/gatk-4.1.1.0/gatk-package-4.1.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so Apr19, 2019 7:47:01 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
19:47:01.895 INFO GenomicsDBImport - ------------------------------------------------------------
19:47:01.909 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.1.0
19:47:01.909 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
19:47:01.910 INFO GenomicsDBImport - Executing as [email protected] on Linux v2.6.32-696.30.1.el6.x86_64 amd64
19:47:01.911 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_201-b09
19:47:01.911 INFO GenomicsDBImport - Start Date/Time:19 avril 201919:46:59 CEST
19:47:01.911 INFO GenomicsDBImport - ------------------------------------------------------------
19:47:01.911 INFO GenomicsDBImport - ------------------------------------------------------------
19:47:01.913 INFO GenomicsDBImport - HTSJDK Version: 2.19.0
19:47:01.913 INFO GenomicsDBImport - Picard Version: 2.19.0
19:47:01.913 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:47:01.913 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:47:01.913 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:47:01.913 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:47:01.914 INFO GenomicsDBImport - Deflater: IntelDeflater
19:47:01.914 INFO GenomicsDBImport - Inflater: IntelInflater
19:47:01.914 INFO GenomicsDBImport - GCS max retries/reopens: 20
19:47:01.914 INFO GenomicsDBImport - Requester pays: disabled
19:47:01.914 INFO GenomicsDBImport - Initializing engine
19:47:04.184 INFO FeatureManager - Using codec BEDCodec to read file file:///tmp/test/test_hg38_1/interval/tmp_interval.bed
19:47:04.201 INFO IntervalArgumentCollection - Processing 105 bp from intervals
19:47:05.065 INFO GenomicsDBImport - Done initializing engine
19:47:05.573 INFO GenomicsDBImport - Vid Map JSON file will be written to /tmp/test/test_hg38_1/gatk_genomicsdbimport_workspace/vidmap.json
19:47:05.574 INFO GenomicsDBImport - Callset Map JSON file will be written to /tmp/test/test_hg38_1/gatk_genomicsdbimport_workspace/callset.json
19:47:05.574 INFO GenomicsDBImport - Complete VCF Header will be written to /tmp/test/test_hg38_1/gatk_genomicsdbimport_workspace/vcfheader.vcf
19:47:05.574 INFO GenomicsDBImport - Importing to array - /tmp/test/test_hg38_1/gatk_genomicsdbimport_workspace/genomicsdb_array
19:47:05.575 INFO ProgressMeter - Starting traversal
19:47:05.576 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
19:47:40.549 INFO GenomicsDBImport - Importing batch 1 with 50 samples
19:47:52.178 INFO ProgressMeter - chr1:155747 0.8 1 1.3
19:47:52.178 INFO GenomicsDBImport - Done importing batch 1/10
19:48:11.993 INFO GenomicsDBImport - Importing batch 2 with 50 samples
19:48:13.721 INFO ProgressMeter - chr1:155747 1.1 2 1.8
19:48:13.721 INFO GenomicsDBImport - Done importing batch 2/10
19:48:57.553 INFO GenomicsDBImport - Importing batch 3 with 50 samples
19:49:04.988 INFO ProgressMeter - chr1:155747 2.0 3 1.5
19:49:04.988 INFO GenomicsDBImport - Done importing batch 3/10
19:49:42.441 INFO GenomicsDBImport - Importing batch 4 with 50 samples
19:49:45.154 INFO ProgressMeter - chr1:155747 2.7 4 1.5
19:49:45.154 INFO GenomicsDBImport - Done importing batch 4/10
19:50:21.063 INFO GenomicsDBImport - Importing batch 5 with 50 samples
19:50:25.656 INFO ProgressMeter - chr1:155747 3.3 5 1.5
19:50:25.657 INFO GenomicsDBImport - Done importing batch 5/10
19:50:55.764 INFO GenomicsDBImport - Importing batch 6 with 50 samples
19:50:57.722 INFO ProgressMeter - chr1:155747 3.9 6 1.6
19:50:57.722 INFO GenomicsDBImport - Done importing batch 6/10
19:51:42.971 INFO GenomicsDBImport - Importing batch 7 with 50 samples
19:51:47.041 INFO ProgressMeter - chr1:155747 4.7 7 1.5
19:51:47.041 INFO GenomicsDBImport - Done importing batch 7/10
19:52:04.977 INFO GenomicsDBImport - Importing batch 8 with 50 samples
19:52:06.476 INFO GenomicsDBImport - Shutting down engine
[19 avril 201919:52:06 CEST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 5.11 minutes.
Runtime.totalMemory()=5189926912
htsjdk.samtools.SAMFormatException: Invalid GZIP header
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.seek(BlockCompressedInputStream.java:380)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:427)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.<init>(TabixFeatureReader.java:159)
at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:133)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1.query(GenomicsDBImport.java:724)
at org.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:146)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:573)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) 

Any ideas on how to solve the issue ?

Thank you

Best Answer

Answers

Sign In or Register to comment.