Running GATK on small pieces of a large BAM file

Hi Team,

I need to know something and hopefully is simple to implement. I have to run GATK HaplotypeCaller on a large BAM file, thus I have to run this in batches of 4 hours top. I've done some test to determine the appropriate size for the subsampled BAM. So let's say I need to run 10 jobs. Each one of those will output a gVCG file, which all belong to the same individual in this experiment.
I have read that there are several methods to parse and merge VCF files into a single one like CatVariants, CombineGVCFs and CombineVariants. The question is that I'm inclined to use CombineGVCFs since is the output I have, but I also have the understanding that this is for merging different individuals which is not what I have to do. So which approach should I use?
Thanks for the inputs!!!


Best Answer


Sign In or Register to comment.