GenomicsDBImport too slow on local server

Hi,

I tried using GenomicsDBImport for our data. In my testcase I tried importing Chromosome 1 for 223 samples. Since most samples are panels and we have only a few genomes and exomes, I thought it would be best to always call anything together.
My commandline:

opt/gatk/4.0.0.0/gatk --java-options "-Xmx8G -Xms8G" GenomicsDBImport
 --sample-name-map[...]/all_samples.sample_map 
--genomicsdb-workspace-path [...]/germline_snp_database_1
 --batch-size 50 
-L NC_000001 
--reader-threads 5

I only use 5 reader threads because I plan on parallelizing with scatter gather later on. The command is running since 14 hours on a local server. Is there something wrong, or something I can do to mae it reasonable fast? So far the GATK 3.8 pipeline is way faster.

Thanks & best regards,
Daniel

Best Answer

Answers

  • dbeckerdbecker MunichMember

    Hi,

    that seems like a lot of effort. I still don't really know how to put those intervals back together in the end. I think I'll stick to CombineGVCFs for now. I can do it stepwise and for our overall ~4000 samples in ~200 runs it seems like the way to go. I'll try the GenomicsDB again when it is possible or recommended to use one for all intervals at once and when you can add to it.

    Thanks for the help,
    Daniel

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dbecker
    Hi Daniel,

    You may also find Geraldine's response here helpful.

    -Sheila

Sign In or Register to comment.