Collected FAQs about interval lists

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,273Administrator, GATK Dev admin
edited January 2013 in FAQs

1. What file formats do you support for interval lists?

We support three types of interval lists, as mentioned here. Interval lists should preferentially be formatted as Picard-style interval lists, with an explicit sequence dictionary, as this prevents accidental misuse (e.g. hg18 intervals on an hg19 file). Note that this file is 1-based, not 0-based (first position in the genome is position 1).

2. I have two (or more) sequencing experiments with different target intervals. How can I combine them?

One relatively easy way to combine your intervals is to use the online tool Galaxy, using the Get Data -> Upload command to upload your intervals, and the Operate on Genomic Intervals command to compute the intersection or union of your intervals (depending on your needs).

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • prepagamprepagam Posts: 23Member

    If I was only interested in calling variants in a set of neutral regions, I wonder if there are any negative implications to intersecting my bam with a bed file of these regions PRIOR to gatk. i.e. doing this rather than using the genomics intervals that GATK offers. For me this is preferable for various storage reasons, but perhaps this has some unknown side effect with GaTK.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,273Administrator, GATK Dev admin

    No problem at all, you can use whatever intervals you want. This may influence the expected Ti/Tv ratio, so keep that in mind when you analyze your callset, but it shouldn't have any effect on the quality of results.

    Geraldine Van der Auwera, PhD

  • eflanneryeflannery San DiegoPosts: 9Member

    Hi Geraldine, It seems like there is a minimum size the interval in the interval list needs to be to get outputted in the Diagnose Targets walker. Do you know this minimum? Is it default or calculated each time? Is there a way to change it?

    Thanks!

    Erika

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,273Administrator, GATK Dev admin

    Hi @eflannery,

    I just looked at the code and didn't find any hardcoded limits. The only limitation that I'm aware of is that intervals must be non-null (ie not zero-length). Why do you think there's a limit?

    Geraldine Van der Auwera, PhD

  • eflanneryeflannery San DiegoPosts: 9Member

    When I run Diagnose Targets there are intervals that are not present in the output file that are present in the interval_list file. All of the intervals that are excluded, are very small, <500bp. I only assumed this is why they were not included. Shouldn't every interval in interval_list be included in the output of diagnose Targets?

    Thanks!

    Erika

Sign In or Register to comment.