To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Combine already merged gVCFs

Hi,

We have two NextSeq Runs per week and end up with a gVCF file of around 40Gb per run. A few of those are easy to handle but at some point it will become too much. Not only in terms of disc space but the time for the GenotypeGVCFs step might increase, too. But since joint genotyping is, as I understand, most effective if we have as many samples as possible, we don't want to miss out on information and quality. We also plan to import the whole callset (after filtering) into a database in one go and the filtered vcf should therefore include all samples.

Is it a good idea to use combine CombineGVCFs to merge the cohort.g.vcf of the last run and all per sample g.vcfs of the current one? In my mind this would produce a slightly bigger new cohort.g.vcf than the one from the last run and we could delete the old.
Or is it possible to use GenotypeGVCFs on our new cohort.g.vcf and the result vcf from the last time we ran GenotypeGVCFs? This way we could go without any of the "used" gVCFs.

Maybe this is a stupid question but I couldn't find a solid answer yet. I apologize if this was already answered somewhere.

Best Answer

Answers

Sign In or Register to comment.