Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Using GenomicsDBImport for combing gvcf in gatk-4.1.2.0 version

AlvaAlva SwitzerlandMember ✭✭

Dear All,
I have 1814 contigs and I need to combine 320 gvcf files Combinegvfs tool is continually throwing error, despite enough memory given.
In this case one of the user suggested GenomicsDBImport tool. I would like to know in my case the --interval would be 1814. Would it be possible using this tool?
Or did I understand something wrong in terms of the definition of --intervals paramter?

Answers

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @Alva

    For GenomicsDBImport, the interval is the genomic region you want to operate over, as this doc describes.
    The doc shows examples of how you can specify one or multiple intervals to use with the Genomicsdpimport tool.

    So for 1814 contigs, at what position does this occur? Put that as the interval.

    Does this help?

  • AlvaAlva SwitzerlandMember ✭✭
    edited July 23

    Hello @Tiffany_at_Broad ,

    Thanks for the reply!

    The case is, I have 22 chromosome in vcf file. I have a 1814 contigs in reference genome.
    I need to operate over all 22 chr within vcf as I have no specific list as input or we are not specifically looking over any list. In this case, does the GenomicsDBImport tool helps me to consolidate all my 32o gvcfs? After that, I need to run the GenotypeGVCFs on that the consolidated gvcf.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi Alva,

    If I understand your question correctly, yes, the GenomicsDBImport tool will help you do joint-calling across your gvcfs. Because it's not trivial to examine the data within the GenomicsDB database, you can extract the combined data from the database using SelectVariants. Then run joint genotyping using GenotypeGVCFs. This tutorial explain the steps.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    If you are operating on just one chromosome, chr22, then list that as the interval. If you are operating across all chromosomes, list them as multiple intervals as the doc describes.

    Are you able to get it running?

Sign In or Register to comment.