The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!
GenomicsDBImport terminates after Overlapping contigs found error
My original query was about batching and making intervals for GenomicsDBImport, but I have run into a new problem. I am using version 184.108.40.206 I tried the following:
gatk GenomicsDBImport \ --java-options "-Xmx250G -XX:+UseParallelGC -XX:ParallelGCThreads=24" \ -V input.list \ --genomicsdb-workspace-path 5sp_45ind_assmb_00 \ --intervals interval.00.list \ --batch-size 9
where I have split my list of contigs into 50 lists, and set batch size as 9 (instead of reading in 45 g.vcf at once) for a total of 5 batches. It looks like it has started to run, but terminated quickly after an error.
The resulting stack trace is:
00:53:23.869 INFO GenomicsDBImport - HTSJDK Version: 2.16.0 00:53:23.869 INFO GenomicsDBImport - Picard Version: 2.18.7 00:53:23.869 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2 00:53:23.869 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 00:53:23.869 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 00:53:23.869 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 00:53:23.869 INFO GenomicsDBImport - Deflater: IntelDeflater 00:53:23.869 INFO GenomicsDBImport - Inflater: IntelInflater 00:53:23.869 INFO GenomicsDBImport - GCS max retries/reopens: 20 00:53:23.869 INFO GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes 00:53:23.869 INFO GenomicsDBImport - Initializing engine 01:26:13.490 INFO IntervalArgumentCollection - Processing 58057410 bp from intervals 01:26:13.517 INFO GenomicsDBImport - Done initializing engine Created workspace /home/leq/gvcfs/5sp_45ind_assmb_00 01:26:13.655 INFO GenomicsDBImport - Vid Map JSON file will be written to 5sp_45ind_assmb_00/vidmap.json 01:26:13.655 INFO GenomicsDBImport - Callset Map JSON file will be written to 5sp_45ind_assmb_00/callset.json 01:26:13.655 INFO GenomicsDBImport - Complete VCF Header will be written to 5sp_45ind_assmb_00/vcfheader.vcf 01:26:13.655 INFO GenomicsDBImport - Importing to array - 5sp_45ind_assmb_00/genomicsdb_array 01:26:13.656 INFO ProgressMeter - Starting traversal 01:26:13.656 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute 01:33:16.970 INFO GenomicsDBImport - Importing batch 1 with 9 samples [libprotobuf ERROR google/protobuf/io/coded_stream.cc:207] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. Contig/chromosome ctg7180018354961 begins at TileDB column 0 and intersects with contig/chromosome ctg7180018354960 that spans columns [1380207667, 1380207970] terminate called after throwing an instance of 'ProtoBufBasedVidMapperException' what(): ProtoBufBasedVidMapperException : Overlapping contigs found
How do I overcome this issue of 'overlapping contigs found'? Is there a problem with my set of contigs? Also, is the warning about protocol messages something to worry about?