Multi-sample calling on distant samples
I'm working with a non-model species, concretely with citrus genus samples. What we do is basically to search target SNVs which could be responsible for the most of the phenotypic differences between citrus varieties/species.
Due to the absence of a reference assembly for every available species we have implement our genotyping pipeline by mapping all citrus species against the same reference genome (concretely, clementine genome). We are aware that this approach can produce unequal bias that are proportional to the sample-to-reference species distance, but at the end, we know that citrus genus species are relatively close, and its quite easy to find many conserved regions between them.
My question is about multi-sample calling. We are confident on performing multi-sample calling when we compare intra-species samples, but
we are not so sure to follow the same methodology when we compare distant samples that don't share a considerable proportion of variants.
What do you recommend us?
We assume two alternatives
1 -Perform multisample-calling, understanding that despite of genomic heterogeneity the variants will be still detected.
2- Perform independent callings, and combine them after (by using CombineVariants tool)
Thanks in advance