The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

How to split gVCF per chromosome

sabqsabq UppsalaMember

I have 130 large [per sample] gVCF files and to skip memory problem wanna run GenotypeGVCF per chromosome. What is the easy way to split gVCF per chromosome. I dont want to come back to haplotypeCaller -L flag ?

Best Answer


  • sabqsabq UppsalaMember
    edited April 2016

    Thanks Geraldine for prompt reply.
    I realized that SelectVariants generates chromosomome-wise gVCFs FROM each sample. How if I want to make chromosomome-wise gVCFs from all 130 samples. eg 130chr1.g.VCF ? I guess this relax memory load that make my jobs crash when running GenotypeGVCF on 130 [each 20G] gVCF files. here is how the command looks like.

    Program Args: java -Xmx32g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ref.fa -L $chr -nt 6 --variant 130gVCF.list --dbsnp ref.vcf.gz -o $chr.vcf

    I tried to remove all idx files and let the GenotypeGVCFs reproduce them but it takes too long to produce idx for each samples. Each idx file is about 50-70 M of size for 20G gVCF. is this normal?

    Post edited by sabq on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    So you are saying GenotypeGVCFs takes too long/crashes when you feed it 130 per chromosome GVCFs? Have you tried combining the per-chromosome GVCFs using CombineGVCFs? You can try combining 10-20 at a time then feeding those combined GVCFs to GenotypeGVCFs.


Sign In or Register to comment.