I have 130 large [per sample] gVCF files and to skip memory problem wanna run GenotypeGVCF per chromosome. What is the easy way to split gVCF per chromosome. I dont want to come back to haplotypeCaller -L flag ?
You should be able to run SelectVariants with -L to generate each chromosome's worth of gVCF.
Thanks Geraldine for prompt reply.
I realized that SelectVariants generates chromosomome-wise gVCFs FROM each sample. How if I want to make chromosomome-wise gVCFs from all 130 samples. eg 130chr1.g.VCF ? I guess this relax memory load that make my jobs crash when running GenotypeGVCF on 130 [each 20G] gVCF files. here is how the command looks like.
Program Args: java -Xmx32g -Djava.io.tmpdir=pwd/tmp -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ref.fa -L $chr -nt 6 --variant 130gVCF.list --dbsnp ref.vcf.gz -o $chr.vcf
I tried to remove all idx files and let the GenotypeGVCFs reproduce them but it takes too long to produce idx for each samples. Each idx file is about 50-70 M of size for 20G gVCF. is this normal?
So you are saying GenotypeGVCFs takes too long/crashes when you feed it 130 per chromosome GVCFs? Have you tried combining the per-chromosome GVCFs using CombineGVCFs? You can try combining 10-20 at a time then feeding those combined GVCFs to GenotypeGVCFs.