To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Running GATK on small pieces of a large BAM file

Hi Team,

I need to know something and hopefully is simple to implement. I have to run GATK HaplotypeCaller on a large BAM file, thus I have to run this in batches of 4 hours top. I've done some test to determine the appropriate size for the subsampled BAM. So let's say I need to run 10 jobs. Each one of those will output a gVCG file, which all belong to the same individual in this experiment.
I have read that there are several methods to parse and merge VCF files into a single one like CatVariants, CombineGVCFs and CombineVariants. The question is that I'm inclined to use CombineGVCFs since is the output I have, but I also have the understanding that this is for merging different individuals which is not what I have to do. So which approach should I use?
Thanks for the inputs!!!


Best Answer


Sign In or Register to comment.