Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

GenomicsDBImport

Hi everyone!

I am trying to construct a GenomicDataBase in 100 of samples and using as many intervals as possible. Following the manual I wrote the following command:


java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /LUSTRE/HOME/dpaule_1/COMUNES/WGS_software/gatk/4.0.8.1/gatk-package-4.0.8.1-local.jar GenomicsDBImport -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0001/gVCF/0890N0001.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0002/gVCF/0890N0002.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0003/gVCF/0890N0003.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0004/gVCF/0890N0004.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0005/gVCF/0890N0005.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0006/gVCF/0890N0006.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0007/gVCF/0890N0007.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0008/gVCF/0890N0008.g.vcf.gz -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/0890N0009/gVCF/0890N0009.g.vcf.gz
[....] -V /LUSTRE/HOME/dpaule_1/dpaule_1_3/HECTOR/Analysis_WGseqv4_180727/ZD11_SRR501845/gVCF/ZD11_SRR501
845.g.vcf.gz --genomicsdb-workspace-path /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB --overwrite-existing-genomicsdb-workspace -L 1 -L 10 -L 11 -L 12 -L 13 -L

14 -L 15 -L 16 -L 17 -L 18 -L 19 -L 2 -L 20 -L 21 -L 22 -L 23 -L 24 -L 25 -L 26 -L 3 -L 4 -L 5 -L 6 -L 7 -L 8 -L 9 -L MT -L X -L JH923329.1 [....] --reader-threads 20

Everything seemed to be working until it wrote the following in the output file:

16:46:56.945 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/LUSTRE/HOME/dpaule_1/COMUNES/WGS_software/gatk/4.0.8.1/gatk-package-4.0.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:46:57.095 INFO GenomicsDBImport - ------------------------------------------------------------
16:46:57.095 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.8.1
16:46:57.096 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
16:46:57.096 INFO GenomicsDBImport - Executing as [email protected] on Linux v2.6.32-573.12.1.el6.x86_64 amd64
16:46:57.096 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_73-b02
16:46:57.096 INFO GenomicsDBImport - Start Date/Time: 20 de septiembre de 2018 16:46:56 CEST
16:46:57.096 INFO GenomicsDBImport - ------------------------------------------------------------
16:46:57.097 INFO GenomicsDBImport - ------------------------------------------------------------
16:46:57.097 INFO GenomicsDBImport - HTSJDK Version: 2.16.0
16:46:57.097 INFO GenomicsDBImport - Picard Version: 2.18.7
16:46:57.097 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:46:57.097 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:46:57.098 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:46:57.098 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:46:57.098 INFO GenomicsDBImport - Deflater: IntelDeflater
16:46:57.098 INFO GenomicsDBImport - Inflater: IntelInflater
16:46:57.098 INFO GenomicsDBImport - GCS max retries/reopens: 20
16:46:57.098 INFO GenomicsDBImport - Using google-cloud-java fork https://github.com/broadinstitute/google-cloud-java/releases/tag/0.20.5-alpha-GCS-RETRY-FIX
16:46:57.098 INFO GenomicsDBImport - Initializing engine
16:47:24.023 INFO IntervalArgumentCollection - Processing 2619054388 bp from intervals
16:47:24.027 WARN GenomicsDBImport - A large number of intervals were specified. Using more than 100 intervals in a single import is not recommended and can cause performance to suffer. It is recommended that intervals be aggregated together.
16:47:24.042 INFO GenomicsDBImport - Done initializing engine
Created workspace /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB
16:47:24.224 INFO GenomicsDBImport - Vid Map JSON file will be written to /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB/vidmap.json
16:47:24.224 INFO GenomicsDBImport - Callset Map JSON file will be written to /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB/callset.json
16:47:24.224 INFO GenomicsDBImport - Complete VCF Header will be written to /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB/vcfheader.vcf
16:47:24.225 INFO GenomicsDBImport - Importing to array - /scratch/dpaule_1/dpaule_1_3/Analysis_WGSeqv4_180727/gatkDB/genomicsdb_array
16:47:24.232 INFO ProgressMeter - Starting traversal
16:47:24.233 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
16:47:25.239 INFO GenomicsDBImport - Starting batch input file preload
16:47:27.512 INFO GenomicsDBImport - Finished batch preload
16:47:27.513 INFO GenomicsDBImport - Importing batch 1 with 144 samples
Buffer resized from 231936bytes to 262156
[...]
Buffer resized from 262157bytes to 262159
06:56:45.926 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.472 INFO GenomicsDBImport - Finished batch preload
06:56:50.473 INFO GenomicsDBImport - Importing batch 1 with 144 samples
06:56:50.519 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.519 INFO GenomicsDBImport - Shutting down engine
[25 de septiembre de 2018 6:56:50 CEST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 6,609.89 minutes.
Runtime.totalMemory()=9480175616
java.util.concurrent.CompletionException: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:1:1-275612895 queried with: 10:1-86447213
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:1:1-275612895 queried with: 10:1-86447213
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$InitializedQueryWrapper.query(GenomicsDBImport.java:766)
at com.intel.genomicsdb.importer.GenomicsDBImporter.(GenomicsDBImporter.java:165)
at com.intel.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:604)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
... 3 more

06:56:50.532 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.534 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.534 INFO GenomicsDBImport - Starting batch input file preload
06:56:50.534 INFO GenomicsDBImport - Starting batch input file preload


After that i could run GenotypeGVCFs over this database.
I am trying to figure out what thats mean. Can you help me?
Best regards,
Héctor.

Best Answer

Answers

  • manolismanolis ✭✭ Member ✭✭

    Hi, I had the same error some time ago with this code:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx10g -jar /home/manolis/bin/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /home/manolis/GATK4/3.WES_Illumina/germSNV/3.genomicsDB/triplo_L/3L --sample-name-map gVCF.list --batch-size 50 --reader-threads 5 -L chr1 -L chr2 -L chr3

    and the error:

    org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:chr1:1-248956422 queried with: chr2:1-242193529
            at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$InitializedQueryWrapper.query(GenomicsDBImport.java:828)
            at com.intel.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:136)
            at com.intel.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:563)
            at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    (END)
    
  • manolismanolis ✭✭ Member ✭✭

    Hi @bhanuGandham , thanks a lot. Regards

  • HectorMarinaHectorMarina Member

    Hello @bhanuGandham, thanks you for your reply.

Sign In or Register to comment.