Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

CatVariants, MergeVcfs or GatherVcfs

dbeckerdbecker MunichMember ✭✭✭


I have 35 WholeExome Samples (and it will be more soon). I called each of those chromosomewise which resulted in 35 * 25 = 875 g.vcfs. In a second step I used CombineGVCFs to merge the samples stepwise for each chromosome (=> 25 g.vcfs). My first idea for a next step was to use CombineGVCFs again to create one final g.vcf and do the genotyping. Since this takes longer than expected I looked around and found CatVariants which should be much faster (https://gatkforums.broadinstitute.org/gatk/discussion/3455/catvariants-or-combinevariants). With this tool I could genotype each chromosome and merge the vcfs in less time. But CatVariants doesn't seem to exist in GATK4 anymore. GATK4 has MergeVcfs and GatherVcfs.

Which tool is the best choice to merge vcfs that contain different chromosomes?

Is it a good idea to do the genotyping parallel and merge the vcfs instead of merging the g.vcfs and genotype the complete result file?

Are there other/better ways to do this? Should I use cobineGVCFs even though it is not that fast? And please don't say GenomicsDB. It is the slowest of them all on our local server.


Best Answer


  • dbeckerdbecker MunichMember ✭✭✭

    I found this:

    Is the only difference between MergeVcfs and GatherVcfs that GatherVcfs is faster and MergeVcfs creates an index?

  • dbeckerdbecker MunichMember ✭✭✭

    Thank you! I'll use GatherVCF for now. As soon as our pipeline runs without any problems, I'll try GenomicsDB again.


Sign In or Register to comment.