Question about Proximal gap filter in MuTect
Hi GATK Team,
I have a quick question related to the Proximal gap filter in MuTect. From the Cibulskis et al paper, MuTect will reject candidate sites if within an 11 bp window, there are 3 or more reads with insertions, or 3 or more reads with deletions. I understand that failure to align around insertion and deletions can lead to false positives (hence the need for the GATK IndelRealignment tool for processing the reads before input to MuTect), but I am not really clear why base mismatches nearby insertions and deletions are necessarily artifacts? I'm probably missing an important point; if someone could explain the rationale behind the Proximal gap filter, I'd appreciate it very much.
Thanks a lot!