We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Use GenomicsDBImport to extract coding regions from 88 human WGS GVCFs

I want to extract coding regions from 88 WGS GVCFs using GenomicsDBImport (followed by GenotypeGVCFs). I have a list of ~222,000 intervals and was thinking of using the -merge-input-intervals parameter (if appropriate for WGS), and scattering the process using 10 different jobs. Is that a good way to speed up the process? I am using GATK (4.1.0.0) but I can't use Cromwell on our local servers. Thanks!

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @etrepo

    I don't see why that process shouldn't work. So yes try it.

  • etrepoetrepo Member
    edited March 2019
    Hi @bhanuGandham

    Thanks! It's currently running but I wasn't sure this was the best way to speed up the process. Since it is recommended to use "manageable" intervals for GenomicsDBImport with WGS (i.e. smaller than a whole chromosome), I was just afraid that -merge-input-intervals would be equivalent to use a whole chromosome. For example, this is what I have inside one of the --genomicsdb-workspace chr1$28554$249214165. After >24h even for smaller chromosomes (chr14$19118515$107283791) it's still importing batch 1. Thanks!
Sign In or Register to comment.