To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Are local realigner and Unified genotyper repeating the same step ?

Hi Geraldine,
I am confused when I read the paper and tutorial on GATK variant caller. First, the DePristo et al. 2011 paper says local re-aligner generates a list of candidate haplotypes using a) dbSNP info, b) presence of atleast one indel in a read and
c) cluster of mismatches at a site. Then it finds the best alternative haplotype based on read haplotype likelihoods and uses log odds ratio to declare ref and alt haplotype pair. Second, the tutorial on GATK variant caller mentions Indel genotype likelihood calculation where UG estimates this via 1) first generate haplotypes from indels in the reads. 2) genotype likelihoods for all haplotype pairs 3) for each haplotype compute read haplotype likelihood using HMM

My question is do both the steps local realignment and UG do haplotype generation and then compute read haplotype likelihood. To me these two steps are redundant because if the best haplotype pairs are generated using local realignment then why is there a need to generate haplotypes for genotype likelihood estimation. I mean in theory, UG should just be able to call indels and assign them genotype by testing all the possible genotypes against the two haplotypes called by the local realigner. Please help me sort this out. I highly appreciate your concern.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    You are right that from the point of view of generating candidate haplotypes, there is some redundancy. However, the haplotypes generated at the indel realignment step aren't recorded in any way so that has to be repeated by UG. You could theoretically save the haplotypes generated at the indel realignment step and give them to UG so it wouldn't have to redo the computation at those sites. But this would be awkward and wouldn't really save much computation time so we consider it's just not worth the hassle.

    In any case, in the newer caller (HaplotypeCaller) the haplotypes are generated in a much better way (using an assembly graph) which produces superior results for variant calling.

  • Thank you very much Geraldine. I highly appreciate it. :)

Sign In or Register to comment.