Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Using IntervalListTools to create scatter intervals for HaplotypeCaller
I was looking at the GATK4 $5 WDL file and see that it uses IntervalListTools to create the interval list for scattering over HaplotypeCaller. In the call to IntervalListTools, it sets the parameter BREAK_BANDS_AT_MULTIPLES_OF to a non-zero value (100,000 in the hg38 json file ). The ScatterIntervalsByNs call to generate the interval list which is used as input to this step is very careful to split at N's, but then in this call we may split in the middle of actual sequence. Won't we potentially be introducing edge effects by doing this? If we split every 100,000 bases, then won't we have issues calling an indel that spans one of these break points?