Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

Combine multi-sample GVCFs

Hi GATK experts,

I have 6144 individual sample gvcfs with different ploidies so can't use GenomicsDBImport for generating a single gvcf for passing it to GenotypeGVCFs. I have tried running all 6144 gvcfs through CombineGVCFs but got stuck due to ulimit constraints which couldn't be resolved despite increasing ulimit 'nproc' and 'nofile' settings to the required higher number. This I think is due to some conflicts with SGE environment or some other arrangements in our own cluster setup. Previously I have successfully run 384 gvcfs through CombineGVCFs to the final steps. So now I have divided these 6144 gvcfs into 16 parts each containing 384 gvcfs. I am running these sixteen 384-gvcf batches through CombineGVCFs for each chromosome (12 chromosomes in total) separately. This will lead to the generation of 192 multi-sample gvcfs. My question is can CombineGVCFs be used to merge multi-sample GVCFs in addition to single sample gvcfs and, if yes, will all the annotation fields still be meaningful?

Regards,
Sanjeev

Best Answer

Answers

  • sanjeevkshsanjeevksh Member

    Hi Shlee (@shlee),

    Thank you for your very prompt and helpful reply. I have earlier successfully combined a set of 384-gvcfs to a single gvcf and further processed it through GenotypeGVCFs. The current sixteen 384-gvcf batches are running although very slowly, would easily take ~ two weeks or so.

    I am not using GenomicsDBImport as it is only suitable for diploid samples and my samples a range of ploidies from 2 to 6.

    I am running CombineGVCFs on each chromosome separately and not on the entire genome.

    Regards,
    Sanjeev

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @sanjeevksh,

    @bhanuGandham pinged me about your reply. It seems I only answered your most immediate question on CombineGVCFs and failed to see your statement that GenomicsDBImport can only handle diploid samples. I apologize. Recent releases of GenomicsDBImport should support mixed ploidies. Did you run into an error when trying to import your samples? If so, we would love to get a snippet of data towards testing this.

  • sanjeevkshsanjeevksh Member

    Hi @shlee & @bhanuGandham,

    Many thanks for pointing me to this update, very helpful indeed! On this occasion I have progressed with CombineGVCFs but would definitely try out GenomicsDBImport on the next given opportunity.

    Regards,
    Sanjeev

Sign In or Register to comment.