Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

TileDB GZIP error when using GenotypeGVCFs on GenomicsDB

jamilla_azjamilla_az Harvard UniversityMember
I have 273 Drosophila genomic sequencing samples from a NovaSeq run that I want to use for variant calling. I created a GenomicsDB for each chromosome using all 273 GVCFs using the following command:

```
~/gatk-4.1.3.0/gatk GenomicsDBImport \
--java-options "-Xmx8g" \
--genomicsdb-workspace-path my_database_${SAMPLE} \
--batch-size 50 \
-L $SAMPLE \
--tmp-dir=tmp \
--sample-name-map ../gvcf.sample_map
```
Where $SAMPLE is my chosen chromosome.

When I tried to run GenotypeGVCFs on these GenomicsDB, I am getting the following errors that appear at random points in the job run (at some point during the chromosome traversal for each job) and stop the process:

My command:
```
~/gatk-4.1.3.0/gatk GenotypeGVCFs \
--java-options "-Xmx8g" \
-R ../00_genome/dmel-all-chromosome-r6.28.fasta \
-V gendb://../04_genDB/my_database_${SAMPLE} \
-O wildFlies_${SAMPLE}.vcf.gz \
--tmp-dir=tmp
```
My error:
```
[TileDB::utils] Error: (gunzip) Cannot decompress with GZIP
[TileDB::ReadState] Error: Cannot decompress tile.
terminate called after throwing an instance of 'VariantStorageManagerException'
what(): VariantStorageManagerException exception : VariantArrayCellIterator increment failed
TileDB error message : [TileDB::ReadState] Error: Cannot decompress tile
```


I did have one chromosome that went all the way to completion for the GenotypeGVCF step, but so far 3 others have failed. I did the GenomicsDBImport step for all chromosomes on a Lustre file system, but I did not specify TILEDB_DISABLE_FILE_LOCKING=1. Despite this, the GenomicsDBImport steps ran to completion without any error messages.

What might be the cause of this TileDB decompression error?

Best Answer

  • jamilla_azjamilla_az Harvard University
    Accepted Answer
    I re-ran GenotypeGVCFs after re-running GenomicsDBImport (specifying TILEDB_DISABLE_FILE_LOCKING=1) for the chromosomes that had errors in the process previously and GenotypeGVCF completed successfully for all my chromosomes this time around.

    Best explanation is that something got corrupted the first time around, but re-running the scripts starting with GenomicsDBImport fixed it. Thank you for your help!

Answers

  • bshifawbshifaw Member, Broadie, Moderator admin

    Hi @jamilla_az ,

    I'll need to ask the dev team but in the meantime would you please send over the full stacktrace of the failed job.

  • jamilla_azjamilla_az Harvard UniversityMember
    What is the best way to send the full stacktrace?
  • bshifawbshifaw Member, Broadie, Moderator admin

    You should be able to attach it to a post on this thread

  • jamilla_azjamilla_az Harvard UniversityMember
    This is the full output of the failed GenotypeGVCF run
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited September 30

    Hi @jamilla_az
    The dev team asked for a bit more info:

    1. Would you check that your system has sufficient memory and free diskspace (particularly the temp directory) during the run.
    2. TileDB should work fine on a Luster system, but if the file system isn't configured correctly it can cause the tool to misbehave. Have you faced any problems with the Luster system previously? If you can test the tool with the same input on a regular file system that would great.
    3. Try to query one of the databases with SelectVariants around the location GenotypeGVCF failed, let us know if you come across the same error.

    If you have sample test file that we could use to replicate the error it would be helpful in determining the cause.

  • jamilla_azjamilla_az Harvard UniversityMember
    Thank you for getting back to me:

    1. The tmp directory is in a 50TB lab scratch space - I have only just started using this temporary scratch space, so it should still be mostly free (>90%). For the chromosomes that did finish the run (3 out of the 6), the memory usage was below the requested amount of 12GB.

    2. I haven't faced any problems with any other GATK tools on this file system, but I will look into other options.

    3. I actually successfully ran SelectVariants on a GenomicsDB for a chromosome (2R) that did not work previously! I ran it on the region where GenotypeGVCF previously had the TileDB error.
    ```
    ~/gatk-4.1.3.0/gatk SelectVariants \
    --java-options "-Xmx8g" \
    -R ../00_genome/dmel-all-chromosome-r6.28.fasta \
    -V gendb://../04_genDB/my_database_2R \
    -L 2R:1323000-1333000 \
    -O wildFlies_2R.vcf.gz
    ```
    One thing to note is that I re-ran GenomicsDBImport for this chromosome a few days ago, specifying TILEDB_DISABLE_FILE_LOCKING=1, prior to running SelectVariants today.

    I only saw this error when using the GenomicsDB for all 273 samples (I did a sample run with 2 samples on these chromosomes and it completed successfully) - The GVCFs/GenomicsDB are quite a large files and I'm not sure how to transfer them.
  • jamilla_azjamilla_az Harvard UniversityMember
    Accepted Answer
    I re-ran GenotypeGVCFs after re-running GenomicsDBImport (specifying TILEDB_DISABLE_FILE_LOCKING=1) for the chromosomes that had errors in the process previously and GenotypeGVCF completed successfully for all my chromosomes this time around.

    Best explanation is that something got corrupted the first time around, but re-running the scripts starting with GenomicsDBImport fixed it. Thank you for your help!
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited October 3

    Happy it's working for you!! Thanks for sharing what happened.

Sign In or Register to comment.