How is a haplotype called by HaplotypeCaller across the genome with RADseq data?
I had a question about how is a haplotype called by HaplotypeCaller across the genome with reduced representation sequencing data. I have ddRADseq data from a diploid organism and I used HaplotypeCaller to get the raw vcf file. I saw some heterozygous SNP sites were phased, however, I also found some unphased heterozygous sites in the vcf file, I guess it was because there was not much information available to phase the sequence.
I wonder how does the program deal with the reduced representation sequencing data to call a haplotype across the whole genome?
Also, I was wondering if I should exclude unphased heterozygous sites for my downstream analysis, if so, how can I do that?
Hope my questions make sense. Thanks!