Use GenomicsDBImport to extract coding regions from 88 human WGS GVCFs

I want to extract coding regions from 88 WGS GVCFs using GenomicsDBImport (followed by GenotypeGVCFs). I have a list of ~222,000 intervals and was thinking of using the -merge-input-intervals parameter (if appropriate for WGS), and scattering the process using 10 different jobs. Is that a good way to speed up the process? I am using GATK ( but I can't use Cromwell on our local servers. Thanks!


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @etrepo

    I don't see why that process shouldn't work. So yes try it.

  • etrepoetrepo Member
    edited March 2019
    Hi @bhanuGandham

    Thanks! It's currently running but I wasn't sure this was the best way to speed up the process. Since it is recommended to use "manageable" intervals for GenomicsDBImport with WGS (i.e. smaller than a whole chromosome), I was just afraid that -merge-input-intervals would be equivalent to use a whole chromosome. For example, this is what I have inside one of the --genomicsdb-workspace chr1$28554$249214165. After >24h even for smaller chromosomes (chr14$19118515$107283791) it's still importing batch 1. Thanks!
