We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

UG/HC interval list running start

GATK Team:

This comment came up after some discussion with a colleague of mine:

"If you split the human genome into 100 pieces, we have to create overlapping regions so that GATK won't miss variants, but this creates a complicated situation where you may have to merge variants at the same locus."

Is it true that I would have to pad intervals and explicitly resolve variants (if called at the same locus)? If I use -L target_intervals specifying non-overlapping intervals, does GATK get a "running start" (say 50bp upstream to get variant context) before emitting variants--as samtools mpileup/bcftools claims to--or does GATK jump in directly at the start of the specified interval (and may not then call variants within some short starting interval)?

Your clarification would be much appreciated.

Best Answer


Sign In or Register to comment.