We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Combining separately joint called vcfs
I have read through the guides and man pages I could find here, but am a bit confused. I have 2 joint called VCFs, produced with the same GATK3.7 pipeline, 3000 samples and 1000 samples. Am I able to combine those VCFs, or is it wiser to re-joint call the 4000 samples together.
This page mentions (as an aside) joint calling in batches of 200 samples, and then combining the results. However it does not mention how that combining would occur - the three combining methods it talks about are for cases different to this one.
It seems like this tool is technically capable of merging vcfs, as well as other non-gatk tools. However I believe that generally merging vcfs is hard, many edge cases and missing data and so on. That is after all the reason for the gvcf workflow. I think the output of that tool merging would be markedly different from a single joint called vcf.
In this question you recommended not to attempt to merge vcfs, but this seems to conflict with the first link above.
This page does not mention the batching at all. I think because genomicsDB and GATK4 is expected to scale better with more samples.
Hope you can clear up my confusion