Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Combine already merged gVCFs

dbeckerdbecker ✭✭✭MunichMember ✭✭✭

Hi,

We have two NextSeq Runs per week and end up with a gVCF file of around 40Gb per run. A few of those are easy to handle but at some point it will become too much. Not only in terms of disc space but the time for the GenotypeGVCFs step might increase, too. But since joint genotyping is, as I understand, most effective if we have as many samples as possible, we don't want to miss out on information and quality. We also plan to import the whole callset (after filtering) into a database in one go and the filtered vcf should therefore include all samples.

Is it a good idea to use combine CombineGVCFs to merge the cohort.g.vcf of the last run and all per sample g.vcfs of the current one? In my mind this would produce a slightly bigger new cohort.g.vcf than the one from the last run and we could delete the old.
Or is it possible to use GenotypeGVCFs on our new cohort.g.vcf and the result vcf from the last time we ran GenotypeGVCFs? This way we could go without any of the "used" gVCFs.

Maybe this is a stupid question but I couldn't find a solid answer yet. I apologize if this was already answered somewhere.

Best Answer

Answers

Sign In or Register to comment.