GenomicsDBImport

Hi everyone!

I am trying to construct a GenomicDataBase in 100 of samples and using as many intervals as possible. Following the manual I wrote the following command:


java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /LUSTRE/HOME/dpaule_1/COMUNES/WGS_software/gatk/4.0.8.1/gatk-package-4.0.8.1-local.jar GenomicsDBImport -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0001/gVCF/0890N0001.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0002/gVCF/0890N0002.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0003/gVCF/0890N0003.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0004/gVCF/0890N0004.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0005/gVCF/0890N0005.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0006/gVCF/0890N0006.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0007/gVCF/0890N0007.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0008/gVCF/0890N0008.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0009/gVCF/0890N0009.g.vcf.gz
[....] -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/ZD11_SRR501845/gVCF/ZD11_SRR501
845.g.vcf.gz --genomicsdb-workspace-path /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB --overwrite-existing-genomicsdb-workspace -L 1 -L 10 -L 11 -L 12 -L 13 -L

14 -L 15 -L 16 -L 17 -L 18 -L 19 -L 2 -L 20 -L 21 -L 22 -L 23 -L 24 -L 25 -L 26 -L 3 -L 4 -L 5 -L 6 -L 7 -L 8 -L 9 -L MT -L X -L JH923329.1 [....] --reader-threads 20

Everything seemed to be working until it wrote the following in the output file:

16:46:56.945 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/LUSTRE/HOME/dpaule_1/COMUNES/WGS_software/gatk/4.0.8.1/gatk-package-4.0.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:46:57.095 INFO GenomicsDBImport - ------------------------------------------------------------
16:46:57.095 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.8.1
16:46:57.096 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
16:46:57.096 INFO GenomicsDBImport - Executing as [email protected] on Linux v2.6.32-573.12.1.el6.x86_64 amd64
16:46:57.096 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_73-b02
16:46:57.096 INFO GenomicsDBImport - Start Date/Time: 20 de septiembre de 2018 16:46:56 CEST
16:46:57.096 INFO GenomicsDBImport - ------------------------------------------------------------
16:46:57.097 INFO GenomicsDBImport - ------------------------------------------------------------
16:46:57.097 INFO GenomicsDBImport - HTSJDK Version: 2.16.0
16:46:57.097 INFO GenomicsDBImport - Picard Version: 2.18.7
16:46:57.097 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:46:57.097 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:46:57.098 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:46:57.098 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:46:57.098 INFO GenomicsDBImport - Deflater: IntelDeflater
16:46:57.098 INFO GenomicsDBImport - Inflater: IntelInflater
16:46:57.098 INFO GenomicsDBImport - GCS max retries/reopens: 20
16:46:57.098 INFO GenomicsDBImport - Using google-cloud-java fork https://github.com/broadinstitute/google-cloud-java/releases/tag/0.20.5-alpha-GCS-RETRY-FIX
16:46:57.098 INFO GenomicsDBImport - Initializing engine
16:47:24.023 INFO IntervalArgumentCollection - Processing 2619054388 bp from intervals
16:47:24.027 WARN GenomicsDBImport - A large number of intervals were specified. Using more than 100 intervals in a single import is not recommended and can cause performance to suffer. It is recommended that intervals be aggregated together.
16:47:24.042 INFO GenomicsDBImport - Done initializing engine
Created workspace /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB
16:47:24.224 INFO GenomicsDBImport - Vid Map JSON file will be written to /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB/vidmap.json
16:47:24.224 INFO GenomicsDBImport - Callset Map JSON file will be written to /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB/callset.json
16:47:24.224 INFO GenomicsDBImport - Complete VCF Header will be written to /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB/vcfheader.vcf
16:47:24.225 INFO GenomicsDBImport - Importing to array - /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB/genomicsdb_array
16:47:24.232 INFO ProgressMeter - Starting traversal
16:47:24.233 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
16:47:25.239 INFO GenomicsDBImport - Starting batch input file preload
16:47:27.512 INFO GenomicsDBImport - Finished batch preload
16:47:27.513 INFO GenomicsDBImport - Importing batch 1 with 144 samples
Buffer resized from 231936bytes to 262156
[...]
Buffer resized from 262157bytes to 262159
06:56:45.926 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.472 INFO GenomicsDBImport - Finished batch preload
06:56:50.473 INFO GenomicsDBImport - Importing batch 1 with 144 samples
06:56:50.519 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.519 INFO GenomicsDBImport - Shutting down engine
[25 de septiembre de 2018 6:56:50 CEST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 6,609.89 minutes.
Runtime.totalMemory()=9480175616
java.util.concurrent.CompletionException: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:1:1-275612895 queried with: 10:1-86447213
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:1:1-275612895 queried with: 10:1-86447213
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$InitializedQueryWrapper.query(GenomicsDBImport.java:766)
at com.intel.genomicsdb.importer.GenomicsDBImporter.(GenomicsDBImporter.java:165)
at com.intel.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:604)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
... 3 more

06:56:50.532 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.534 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.534 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.534 INFO GenomicsDBImport - Starting batch input file preload


After that i could run GenotypeGVCFs over this database.
I am trying to figure out what thats mean. Can you help me?
Best regards,
Héctor.

Best Answer

Answers

  • manolismanolis Member ✭✭

    Hi, I had the same error some time ago with this code:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx10g -jar /home/manolis/bin/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /home/manolis/GATK4/3.WES_Illumina/germSNV/3.genomicsDB/triplo_L/3L --sample-name-map gVCF.list --batch-size 50 --reader-threads 5 -L chr1 -L chr2 -L chr3

    and the error:

    org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:chr1:1-248956422 queried with: chr2:1-242193529
            at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$InitializedQueryWrapper.query(GenomicsDBImport.java:828)
            at com.intel.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:136)
            at com.intel.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:563)
            at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    (END)
    
  • manolismanolis Member ✭✭

    Hi @bhanuGandham , thanks a lot. Regards

  • Hello @bhanuGandham, thanks you for your reply.

Sign In or Register to comment.