We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How to split gVCF per chromosome

sabqsabq UppsalaMember

I have 130 large [per sample] gVCF files and to skip memory problem wanna run GenotypeGVCF per chromosome. What is the easy way to split gVCF per chromosome. I dont want to come back to haplotypeCaller -L flag ?

Best Answer


  • sabqsabq UppsalaMember
    edited April 2016

    Thanks Geraldine for prompt reply.
    I realized that SelectVariants generates chromosomome-wise gVCFs FROM each sample. How if I want to make chromosomome-wise gVCFs from all 130 samples. eg 130chr1.g.VCF ? I guess this relax memory load that make my jobs crash when running GenotypeGVCF on 130 [each 20G] gVCF files. here is how the command looks like.

    Program Args: java -Xmx32g -Djava.io.tmpdir=pwd/tmp -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ref.fa -L $chr -nt 6 --variant 130gVCF.list --dbsnp ref.vcf.gz -o $chr.vcf

    I tried to remove all idx files and let the GenotypeGVCFs reproduce them but it takes too long to produce idx for each samples. Each idx file is about 50-70 M of size for 20G gVCF. is this normal?

    Post edited by sabq on
  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    So you are saying GenotypeGVCFs takes too long/crashes when you feed it 130 per chromosome GVCFs? Have you tried combining the per-chromosome GVCFs using CombineGVCFs? You can try combining 10-20 at a time then feeding those combined GVCFs to GenotypeGVCFs.


Sign In or Register to comment.