Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Use GenomicsDBImport to extract coding regions from 88 human WGS GVCFs

I want to extract coding regions from 88 WGS GVCFs using GenomicsDBImport (followed by GenotypeGVCFs). I have a list of ~222,000 intervals and was thinking of using the -merge-input-intervals parameter (if appropriate for WGS), and scattering the process using 10 different jobs. Is that a good way to speed up the process? I am using GATK (4.1.0.0) but I can't use Cromwell on our local servers. Thanks!

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @etrepo

    I don't see why that process shouldn't work. So yes try it.

  • etrepoetrepo Member
    edited March 19
    Hi @bhanuGandham

    Thanks! It's currently running but I wasn't sure this was the best way to speed up the process. Since it is recommended to use "manageable" intervals for GenomicsDBImport with WGS (i.e. smaller than a whole chromosome), I was just afraid that -merge-input-intervals would be equivalent to use a whole chromosome. For example, this is what I have inside one of the --genomicsdb-workspace chr1$28554$249214165. After >24h even for smaller chromosomes (chr14$19118515$107283791) it's still importing batch 1. Thanks!
Sign In or Register to comment.