It looks like you're new here. If you want to get involved, click one of these buttons!
This comment came up after some discussion with a colleague of mine:
"If you split the human genome into 100 pieces, we have to create overlapping regions so that GATK won't miss variants, but this creates a complicated situation where you may have to merge variants at the same locus."
Is it true that I would have to pad intervals and explicitly resolve variants (if called at the same locus)? If I use -L target_intervals specifying non-overlapping intervals, does GATK get a "running start" (say 50bp upstream to get variant context) before emitting variants--as samtools mpileup/bcftools claims to--or does GATK jump in directly at the start of the specified interval (and may not then call variants within some short starting interval)?
Your clarification would be much appreciated.