To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

GenomicsDBImport too slow on local server


I tried using GenomicsDBImport for our data. In my testcase I tried importing Chromosome 1 for 223 samples. Since most samples are panels and we have only a few genomes and exomes, I thought it would be best to always call anything together.
My commandline:

opt/gatk/ --java-options "-Xmx8G -Xms8G" GenomicsDBImport
--genomicsdb-workspace-path [...]/germline_snp_database_1
 --batch-size 50 
-L NC_000001 
--reader-threads 5

I only use 5 reader threads because I plan on parallelizing with scatter gather later on. The command is running since 14 hours on a local server. Is there something wrong, or something I can do to mae it reasonable fast? So far the GATK 3.8 pipeline is way faster.

Thanks & best regards,

Best Answer


  • dbeckerdbecker MunichMember


    that seems like a lot of effort. I still don't really know how to put those intervals back together in the end. I think I'll stick to CombineGVCFs for now. I can do it stepwise and for our overall ~4000 samples in ~200 runs it seems like the way to go. I'll try the GenomicsDB again when it is possible or recommended to use one for all intervals at once and when you can add to it.

    Thanks for the help,

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi Daniel,

    You may also find Geraldine's response here helpful.


Sign In or Register to comment.