This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Combining separately joint called vcfs
I have read through the guides and man pages I could find here, but am a bit confused. I have 2 joint called VCFs, produced with the same GATK3.7 pipeline, 3000 samples and 1000 samples. Am I able to combine those VCFs, or is it wiser to re-joint call the 4000 samples together.
This page mentions (as an aside) joint calling in batches of 200 samples, and then combining the results. However it does not mention how that combining would occur - the three combining methods it talks about are for cases different to this one.
It seems like this tool is technically capable of merging vcfs, as well as other non-gatk tools. However I believe that generally merging vcfs is hard, many edge cases and missing data and so on. That is after all the reason for the gvcf workflow. I think the output of that tool merging would be markedly different from a single joint called vcf.
In this question you recommended not to attempt to merge vcfs, but this seems to conflict with the first link above.
This page does not mention the batching at all. I think because genomicsDB and GATK4 is expected to scale better with more samples.
Hope you can clear up my confusion