This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
UnifiedGenotyper - How are regions to examine for Variant Calling (SNPs, Indels) detected?
I just had a look at the code of the UnifiedGenotyper and how its Variant Calling algorithm is implemented (very well documented by the way . But now I wonder how GATK reads in the SAM file and finds out where the differences to the reference (SNPs, Indels) are that are then examined in the Variant Calling. I only see that the UnifiedGenotyper gets a set of alleles, but not where the alleles are actually determined. I have also found out that GATK is using Samtools for parsing SAM files, but have not found the point where the actual reads are parsed and processed (e.g., by using the CIGAR string). Are you maybe doing a local realignment before the actual Variant Calling from which you get the alleles?
I would appreciate if you could guide me to the right direction where this happens in the code.