This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
appropriate members for generating "known-sites" list
I have 46 complete genomes and a good reference genome. Two of the individuals are "outgroups" (two different species). The rest are the same species as the reference genome. One of the outgroups hybridizes with the ingroup (we are studying this admixture). I have gVCF files for all individuals generated by HaplotypeCaller. When selecting and filtering variants to generate a "known-sites" list, should I exclude the outgroups? That seems like the right thing to do, but I could not think of a reason why adding the two outgroups would be a problem. Perhaps they will have unique SNPs and compromise the "known-sites" list?
Also, when creating a database of gVCFs (GenomicDBImport), should I include all individuals and then exclude individuals in the GenotypeGVCFs tool? I could not find an obvious option to exclude individuals, except perhaps --annotations-to-exclude.