Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to split gVCF per chromosome

sabqsabq UppsalaMember

I have 130 large [per sample] gVCF files and to skip memory problem wanna run GenotypeGVCF per chromosome. What is the easy way to split gVCF per chromosome. I dont want to come back to haplotypeCaller -L flag ?

Best Answer

Answers

  • sabqsabq UppsalaMember
    edited April 2016

    Thanks Geraldine for prompt reply.
    I realized that SelectVariants generates chromosomome-wise gVCFs FROM each sample. How if I want to make chromosomome-wise gVCFs from all 130 samples. eg 130chr1.g.VCF ? I guess this relax memory load that make my jobs crash when running GenotypeGVCF on 130 [each 20G] gVCF files. here is how the command looks like.

    Program Args: java -Xmx32g -Djava.io.tmpdir=pwd/tmp -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ref.fa -L $chr -nt 6 --variant 130gVCF.list --dbsnp ref.vcf.gz -o $chr.vcf

    I tried to remove all idx files and let the GenotypeGVCFs reproduce them but it takes too long to produce idx for each samples. Each idx file is about 50-70 M of size for 20G gVCF. is this normal?

    Post edited by sabq on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sabq
    Hi,

    So you are saying GenotypeGVCFs takes too long/crashes when you feed it 130 per chromosome GVCFs? Have you tried combining the per-chromosome GVCFs using CombineGVCFs? You can try combining 10-20 at a time then feeding those combined GVCFs to GenotypeGVCFs.

    -Sheila

Sign In or Register to comment.