The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

GenomicsDBImport too slow on local server

dbeckerdbecker MunichMember ✭✭


I tried using GenomicsDBImport for our data. In my testcase I tried importing Chromosome 1 for 223 samples. Since most samples are panels and we have only a few genomes and exomes, I thought it would be best to always call anything together.
My commandline:

opt/gatk/ --java-options "-Xmx8G -Xms8G" GenomicsDBImport
--genomicsdb-workspace-path [...]/germline_snp_database_1
 --batch-size 50 
-L NC_000001 
--reader-threads 5

I only use 5 reader threads because I plan on parallelizing with scatter gather later on. The command is running since 14 hours on a local server. Is there something wrong, or something I can do to mae it reasonable fast? So far the GATK 3.8 pipeline is way faster.

Thanks & best regards,

Best Answer


  • dbeckerdbecker MunichMember ✭✭


    that seems like a lot of effort. I still don't really know how to put those intervals back together in the end. I think I'll stick to CombineGVCFs for now. I can do it stepwise and for our overall ~4000 samples in ~200 runs it seems like the way to go. I'll try the GenomicsDB again when it is possible or recommended to use one for all intervals at once and when you can add to it.

    Thanks for the help,

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Daniel,

    You may also find Geraldine's response here helpful.


Sign In or Register to comment.