Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

GenomicsDBImport too slow on local server

dbeckerdbecker MunichMember ✭✭


I tried using GenomicsDBImport for our data. In my testcase I tried importing Chromosome 1 for 223 samples. Since most samples are panels and we have only a few genomes and exomes, I thought it would be best to always call anything together.
My commandline:

opt/gatk/ --java-options "-Xmx8G -Xms8G" GenomicsDBImport
--genomicsdb-workspace-path [...]/germline_snp_database_1
 --batch-size 50 
-L NC_000001 
--reader-threads 5

I only use 5 reader threads because I plan on parallelizing with scatter gather later on. The command is running since 14 hours on a local server. Is there something wrong, or something I can do to mae it reasonable fast? So far the GATK 3.8 pipeline is way faster.

Thanks & best regards,

Best Answer


  • dbeckerdbecker MunichMember ✭✭


    that seems like a lot of effort. I still don't really know how to put those intervals back together in the end. I think I'll stick to CombineGVCFs for now. I can do it stepwise and for our overall ~4000 samples in ~200 runs it seems like the way to go. I'll try the GenomicsDB again when it is possible or recommended to use one for all intervals at once and when you can add to it.

    Thanks for the help,

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Daniel,

    You may also find Geraldine's response here helpful.


Sign In or Register to comment.