Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

Ambiguous error in GenomicDBImport

manolismanolis Member ✭✭
edited March 2018 in Ask the GATK team

Hi,

in the HaplotypeCaller step I use 404 intervals with 100bp padding.

Then, when I'm going to create the DB wiht GenomicDBImport , with the same range of intervals, only in some in some cases I have the following error, not in all intervals!

Interval 001 without error:

... ...
14:03:59.867 INFO ProgressMeter - Traversal complete. Processed 6 total batches in 0.2 minutes.
14:03:59.867 INFO GenomicsDBImport - Import of all batches to GenomicsDB completed!
14:03:59.956 INFO GenomicsDBImport - Shutting down engine
[March 27, 2018 2:03:59 PM CEST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.24 minutes.
Runtime.totalMemory()=3493855232

Interval 007 with error:

...
terminate called after throwing an instance of 'VCF2TileDBException'
what(): VCF2TileDBException : Incorrect cell order found - cells must be in column major order. Previous cell: [ 1, 117999971 ] current cell: [ 1, 11799997
The most likely cause is unexpected data in the input file:
(a) A VCF file has two lines with the same genomic position
(b) An unsorted CSV file
(c) Malformed VCF file (or malformed index)
See point 2 at: https://github.com/Intel-HLS/GenomicsDB/wiki/Importing-VCF-data-into-GenomicsDB#organizing-your-data

I don't understand why in some cases/intevals I don't have errors while in other cases/intervals I have errors, during the same analysis... considering that I start from the same gVCF !!! Is it relative to the padding regions during HaplotypeCaller?

Best

Tagged:

Best Answers

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Padding regions have nothing to do with this error. Overlapping regions are already discounted in bed codec. The problem may be with your import intervals.

  • manolismanolis Member ✭✭

    I tried with the whole chromosomes... in some chromosomes the analysis is stuck, in other cases I have the above reported error ...

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Do you get an error message when it is stuck ?

    Import may be taking a long time?

  • manolismanolis Member ✭✭
    edited March 2018

    at the end yes...

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    ValidateVCF may be used but I am not sure if that works on GVCFs.

  • manolismanolis Member ✭✭
    edited March 2018

    I will try ValidateVariants (is ok for GVCFs) and also I will try to check option (a) and (c) reported in the error message...

    Post edited by manolis on
  • manolismanolis Member ✭✭

    nothing works... I will rebuilt my gVCFs...

    Thank you SkyW

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Hımm. I am not sure if rebuilding is required. Does the ValidateVariants tool invalidate any of your gvcfs.

  • manolismanolis Member ✭✭
    edited March 2018

    no, everything is ok with ValVar tool! I created also a new index but the same result. I ran 3 gVCF with GDBImport and I had the same error during this step. I ran 2 other gVFC and I didn't have any more errors but at the next step, GenotypeGVCFs, I had a long list of errors...
    In my case, some days GenomicDBImport works, another days no... of course something I'm doing wrong but I don't know what!

  • manolismanolis Member ✭✭

    I created 2 new gVCF:

    1) GenomicsDBImport

    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx8g -jar /share/apps/bio/gatk-4.0.2.1/gatk-package-4.0.2.1-local.jar GenomicsDBImport --genomicsdb-workspace-path /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/001chr1 --batch-size 50 -L chr1:10001-207666 --sample-name-map gVCF.list --reader-threads 5 -ip 500
    21:01:50.946 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/gatk-4.0.2.1/gatk-package-4.0.2.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    21:01:51.161 INFO GenomicsDBImport - ------------------------------------------------------------
    21:01:51.161 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.2.1
    21:01:51.161 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    21:01:51.163 INFO GenomicsDBImport - Executing as [email protected] on Linux v3.5.0-36-generic amd64
    21:01:51.163 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_91-b14
    21:01:51.164 INFO GenomicsDBImport - Start Date/Time: March 27, 2018 9:01:50 PM CEST
    21:01:51.164 INFO GenomicsDBImport - ------------------------------------------------------------
    21:01:51.164 INFO GenomicsDBImport - ------------------------------------------------------------
    21:01:51.165 INFO GenomicsDBImport - HTSJDK Version: 2.14.3
    21:01:51.166 INFO GenomicsDBImport - Picard Version: 2.17.2
    21:01:51.166 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    21:01:51.166 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    21:01:51.166 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    21:01:51.166 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    21:01:51.167 INFO GenomicsDBImport - Deflater: IntelDeflater
    21:01:51.167 INFO GenomicsDBImport - Inflater: IntelInflater
    21:01:51.167 INFO GenomicsDBImport - GCS max retries/reopens: 20
    21:01:51.167 INFO GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    21:01:51.167 INFO GenomicsDBImport - Initializing engine
    21:01:52.268 INFO IntervalArgumentCollection - Processing 198666 bp from intervals
    21:01:52.277 INFO GenomicsDBImport - Done initializing engine
    Created workspace /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/001chr1
    21:01:52.580 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/001chr1/vidmap.json
    21:01:52.581 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/001chr1/callset.json
    21:01:52.581 INFO GenomicsDBImport - Complete VCF Header will be written to /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/001chr1/vcfheader.vcf
    21:01:52.581 INFO GenomicsDBImport - Importing to array - /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/001chr1/genomicsdb_array
    21:01:52.617 INFO ProgressMeter - Starting traversal
    21:01:52.618 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
    21:01:52.619 INFO GenomicsDBImport - Starting batch input file preload
    21:01:52.924 INFO GenomicsDBImport - Finished batch preload
    21:01:52.925 INFO GenomicsDBImport - Importing batch 1 with 2 samples
    21:01:54.355 INFO GenomicsDBImport - Done importing batch 1/1
    21:01:54.356 INFO ProgressMeter - chr1:9501 0.0 1 34.6
    21:01:54.356 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 0.0 minutes.
    21:01:54.356 INFO GenomicsDBImport - Import of all batches to GenomicsDB completed!
    21:01:54.493 INFO GenomicsDBImport - Shutting down engine
    [March 27, 2018 9:01:54 PM CEST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=2529689600
    Tool returned:
    true

    2) GenotypeGVCFs

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx8g -jar /share/apps/bio/gatk-4.0.2.1/gatk-package-4.0.2.1-local.jar GenotypeGVCFs -R /home/shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -O /home/manolis/GATK4/IlluminaExomePairEnd/6.vcf/processing/WES_prova_001chr1.vcf -D /home/shared/resources/gatk4hg38db/Homo_sapiens_assembly38.dbsnp138.vcf -G StandardAnnotation --only-output-calls-starting-in-intervals -new-qual -V gendb://WES_prova/001chr1 -L chr1:10001-207666
    21:02:22.069 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/gatk-4.0.2.1/gatk-package-4.0.2.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    21:02:22.350 INFO GenotypeGVCFs - ------------------------------------------------------------
    21:02:22.351 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.2.1
    21:02:22.351 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    21:02:22.352 INFO GenotypeGVCFs - Executing as [email protected] on Linux v3.5.0-36-generic amd64
    21:02:22.353 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_91-b14
    21:02:22.353 INFO GenotypeGVCFs - Start Date/Time: March 27, 2018 9:02:22 PM CEST
    21:02:22.353 INFO GenotypeGVCFs - ------------------------------------------------------------
    21:02:22.353 INFO GenotypeGVCFs - ------------------------------------------------------------
    21:02:22.354 INFO GenotypeGVCFs - HTSJDK Version: 2.14.3
    21:02:22.354 INFO GenotypeGVCFs - Picard Version: 2.17.2
    21:02:22.355 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    21:02:22.355 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    21:02:22.355 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    21:02:22.355 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    21:02:22.355 INFO GenotypeGVCFs - Deflater: IntelDeflater
    21:02:22.355 INFO GenotypeGVCFs - Inflater: IntelInflater
    21:02:22.355 INFO GenotypeGVCFs - GCS max retries/reopens: 20
    21:02:22.355 INFO GenotypeGVCFs - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    21:02:22.355 INFO GenotypeGVCFs - Initializing engine
    21:02:23.592 INFO FeatureManager - Using codec VCFCodec to read file file:///home/shared/resources/gatk4hg38db/Homo_sapiens_assembly38.dbsnp138.vcf
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    21:02:24.828 INFO IntervalArgumentCollection - Processing 197666 bp from intervals
    21:02:24.929 INFO GenotypeGVCFs - Done initializing engine
    21:02:26.254 INFO ProgressMeter - Starting traversal
    21:02:26.255 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),0.12603564300000003,Cpu time(s),0.11598068300000029
    21:02:28.382 INFO ProgressMeter - chr1:194713 0.0 4118 116218.3
    21:02:28.382 INFO ProgressMeter - Traversal complete. Processed 4118 total variants in 0.0 minutes.
    21:02:28.407 INFO GenotypeGVCFs - Shutting down engine
    [March 27, 2018 9:02:28 PM CEST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.11 minutes.
    Runtime.totalMemory()=2347237376

    Please, any help!

  • manolismanolis Member ✭✭
    edited March 2018

    Here are the logs when in the GenomicsDBImport I use a chromosome with the sames 2 gVCF...

    Using GATK jar /share/apps/bio/gatk-4.0.2.1/gatk-package-4.0.2.1-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx8g -jar /share/apps/bio/gatk-4.0.2.1/gatk-package-4.0.2.1-local.jar GenomicsDBImport --genomicsdb-workspace-path /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/01 --batch-size 50 -L chr1 --sample-name-map gVCF.list --reader-threads 5 -ip 500
    21:20:02.939 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/gatk-4.0.2.1/gatk-package-4.0.2.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    21:20:03.112 INFO GenomicsDBImport - ------------------------------------------------------------
    21:20:03.113 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.2.1
    21:20:03.113 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    21:20:03.114 INFO GenomicsDBImport - Executing as [email protected] on Linux v3.5.0-36-generic amd64
    21:20:03.114 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_91-b14
    21:20:03.115 INFO GenomicsDBImport - Start Date/Time: March 27, 2018 9:20:02 PM CEST
    21:20:03.115 INFO GenomicsDBImport - ------------------------------------------------------------
    21:20:03.115 INFO GenomicsDBImport - ------------------------------------------------------------
    21:20:03.116 INFO GenomicsDBImport - HTSJDK Version: 2.14.3
    21:20:03.116 INFO GenomicsDBImport - Picard Version: 2.17.2
    21:20:03.116 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    21:20:03.116 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    21:20:03.116 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    21:20:03.117 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    21:20:03.117 INFO GenomicsDBImport - Deflater: IntelDeflater
    21:20:03.117 INFO GenomicsDBImport - Inflater: IntelInflater
    21:20:03.117 INFO GenomicsDBImport - GCS max retries/reopens: 20
    21:20:03.117 INFO GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    21:20:03.117 INFO GenomicsDBImport - Initializing engine
    21:20:04.000 INFO IntervalArgumentCollection - Processing 248956422 bp from intervals
    21:20:04.005 INFO GenomicsDBImport - Done initializing engine
    Created workspace /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/01
    21:20:04.218 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/01/vidmap.json
    21:20:04.218 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/01/callset.json
    21:20:04.218 INFO GenomicsDBImport - Complete VCF Header will be written to /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/01/vcfheader.vcf
    21:20:04.219 INFO GenomicsDBImport - Importing to array - /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/mergedGVCFdb/WES_prova/01/genomicsdb_array
    21:20:04.234 INFO ProgressMeter - Starting traversal
    21:20:04.235 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
    21:20:04.235 INFO GenomicsDBImport - Starting batch input file preload
    21:20:04.441 INFO GenomicsDBImport - Finished batch preload
    21:20:04.442 INFO GenomicsDBImport - Importing batch 1 with 2 samples
    Buffer resized from 219175bytes to 262148
    Buffer resized from 219175bytes to 262081
    Buffer resized from 262081bytes to 262116
    Buffer resized from 262116bytes to 262117
    Buffer resized from 262148bytes to 262149
    Buffer resized from 262117bytes to 262151
    Buffer resized from 262149bytes to 262150
    Buffer resized from 262150bytes to 262151
    terminate called after throwing an instance of 'VCF2TileDBException'
    what(): VCF2TileDBException : Incorrect cell order found - cells must be in column major order. Previous cell: [ 0, 58999902 ] current cell: [ 0, 58999902 ].
    The most likely cause is unexpected data in the input file:
    (a) A VCF file has two lines with the same genomic position
    (b) An unsorted CSV file
    (c) Malformed VCF file (or malformed index)
    See point 2 at: https://github.com/Intel-HLS/GenomicsDB/wiki/Importing-VCF-data-into-GenomicsDB#organizing-your-data

    Any suggestions?

  • manolismanolis Member ✭✭

    About intervals as chromosome, I fixed points (a) and (c; index), but now I have the same warnings in the GenotypeGVCF step... I found some threads about -new-qual but this options is alredy included in the code...

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @manolis
    Hi,

    Have a look at this thread.

    -Sheila

  • manolismanolis Member ✭✭
    Accepted Answer

    perfect! Thank you

Sign In or Register to comment.