If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
HaplotypeCaller Multisample Variant Calling
I've been using HaplotypeCaller as part of a new whole genome variant calling pipeline I'm working on and I had a question about the number of samples to use. From my tests, it seems like increasing the number of samples to run HaplotypeCaller on simultaneously improves the accuracy no matter how many samples I add (I've tried 1, 4, and 8 samples at a time). Before I tried 16 samples, I was wondering if you could tell me if there's a point of diminishing returns for adding samples to HaplotypeCaller. It seems like with every sample I add, the time/sample increases, so I don't want to keep adding samples if it's not going to result in an improved call set, but if it does improve the results I'll deal with it and live with the longer run times. I should note that I'm making this pipeline for an experiment where there will be up to 50 individuals, and of those, there are family groups of 3-4 people. If running HaplotypeCaller on all 50 simultaneously would result in the best call set, that's what I'll do. Thanks! (By the way, I love the improvements you made with 2.5!)