Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

UG/HC interval list running start

bredesonbredeson Posts: 22Member

GATK Team:

This comment came up after some discussion with a colleague of mine:

"If you split the human genome into 100 pieces, we have to create overlapping regions so that GATK won't miss variants, but this creates a complicated situation where you may have to merge variants at the same locus."

Is it true that I would have to pad intervals and explicitly resolve variants (if called at the same locus)? If I use -L target_intervals specifying non-overlapping intervals, does GATK get a "running start" (say 50bp upstream to get variant context) before emitting variants--as samtools mpileup/bcftools claims to--or does GATK jump in directly at the start of the specified interval (and may not then call variants within some short starting interval)?

Your clarification would be much appreciated.

Best Answer

Sign In or Register to comment.