Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenomicsDBImport creates tons of files

JCGrenierJCGrenier Montreal, QCMember ✭✭

Hello dear GATK team!

Thanks for all the good work you are doing for the community! That's an awesome toolkit you're providing!
I was working with a reference genome today that was containing thousands of unplaced contigs. This got me into some troubles when I was doing the GenomicsDBimport step, since it was creating ~200k files for every new sample imported.

Did you have any issue with this? This is a problem for me since we are working on a cluster with some quotas about the number of files per user/group.

Thanks a lot for your help!

JC

Best Answer

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited September 29

    HI @JCGrenier

    Can you please explain what you mean by "unplaced contigs".
    Also please share the version of gatk you are using and the exact command. Thank you.

  • JCGrenierJCGrenier Montreal, QCMember ✭✭

    Hello,

    I'm using the version 4.1.2.0. I can't really tell the exact command since I'm using a parser made for GPU computation (called parabricks).

    The genome I'm working on it the horse genome EquCab3.0.

    There is 4701 scaffolds on this genome in the Ensembl reference.
    https://www.ncbi.nlm.nih.gov/assembly/GCF_002863925.1/

    They are using GenomicsDBImport for sure in this step that is causing me issues since they are following the recommended pipeline. Anyhow, I guess I can either modify the reference or go by intervals, but do you think there would be a way to make less files per chromosome?

    Thanks a lot for your help.

    Jean-Christophe

  • JCGrenierJCGrenier Montreal, QCMember ✭✭

    Hello!

    Thanks for your answer. I'll use your advice then!

    JC

Sign In or Register to comment.