Are local realigner and Unified genotyper repeating the same step ?
I am confused when I read the paper and tutorial on GATK variant caller. First, the DePristo et al. 2011 paper says local re-aligner generates a list of candidate haplotypes using a) dbSNP info, b) presence of atleast one indel in a read and
c) cluster of mismatches at a site. Then it finds the best alternative haplotype based on read haplotype likelihoods and uses log odds ratio to declare ref and alt haplotype pair. Second, the tutorial on GATK variant caller mentions Indel genotype likelihood calculation where UG estimates this via 1) first generate haplotypes from indels in the reads. 2) genotype likelihoods for all haplotype pairs 3) for each haplotype compute read haplotype likelihood using HMM
My question is do both the steps local realignment and UG do haplotype generation and then compute read haplotype likelihood. To me these two steps are redundant because if the best haplotype pairs are generated using local realignment then why is there a need to generate haplotypes for genotype likelihood estimation. I mean in theory, UG should just be able to call indels and assign them genotype by testing all the possible genotypes against the two haplotypes called by the local realigner. Please help me sort this out. I highly appreciate your concern.