If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Multi-sample calling on distant samples
I'm working with a non-model species, concretely with citrus genus samples. What we do is basically to search target SNVs which could be responsible for the most of the phenotypic differences between citrus varieties/species.
Due to the absence of a reference assembly for every available species we have implement our genotyping pipeline by mapping all citrus species against the same reference genome (concretely, clementine genome). We are aware that this approach can produce unequal bias that are proportional to the sample-to-reference species distance, but at the end, we know that citrus genus species are relatively close, and its quite easy to find many conserved regions between them.
My question is about multi-sample calling. We are confident on performing multi-sample calling when we compare intra-species samples, but
we are not so sure to follow the same methodology when we compare distant samples that don't share a considerable proportion of variants.
What do you recommend us?
We assume two alternatives
1 -Perform multisample-calling, understanding that despite of genomic heterogeneity the variants will be still detected.
2- Perform independent callings, and combine them after (by using CombineVariants tool)
Thanks in advance