Collected FAQs about interval lists

1. What file formats do you support for interval lists?

We support three types of interval lists, as mentioned here. Interval lists should preferentially be formatted as Picard-style interval lists, with an explicit sequence dictionary, as this prevents accidental misuse (e.g. hg18 intervals on an hg19 file). Note that this file is 1-based, not 0-based (first position in the genome is position 1).

2. I have two (or more) sequencing experiments with different target intervals. How can I combine them?

One relatively easy way to combine your intervals is to use the online tool Galaxy, using the Get Data -> Upload command to upload your intervals, and the Operate on Genomic Intervals command to compute the intersection or union of your intervals (depending on your needs).

    If I was only interested in calling variants in a set of neutral regions, I wonder if there are any negative implications to intersecting my bam with a bed file of these regions PRIOR to gatk. i.e. doing this rather than using the genomics intervals that GATK offers. For me this is preferable for various storage reasons, but perhaps this has some unknown side effect with GaTK.

    No problem at all, you can use whatever intervals you want. This may influence the expected Ti/Tv ratio, so keep that in mind when you analyze your callset, but it shouldn't have any effect on the quality of results.

