Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Collected FAQs about interval lists

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin
edited January 2013 in FAQs

1. What file formats do you support for interval lists?

We support three types of interval lists, as mentioned here. Interval lists should preferentially be formatted as Picard-style interval lists, with an explicit sequence dictionary, as this prevents accidental misuse (e.g. hg18 intervals on an hg19 file). Note that this file is 1-based, not 0-based (first position in the genome is position 1).

2. I have two (or more) sequencing experiments with different target intervals. How can I combine them?

One relatively easy way to combine your intervals is to use the online tool Galaxy, using the Get Data -> Upload command to upload your intervals, and the Operate on Genomic Intervals command to compute the intersection or union of your intervals (depending on your needs).

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • prepagamprepagam Posts: 9Member

    If I was only interested in calling variants in a set of neutral regions, I wonder if there are any negative implications to intersecting my bam with a bed file of these regions PRIOR to gatk. i.e. doing this rather than using the genomics intervals that GATK offers. For me this is preferable for various storage reasons, but perhaps this has some unknown side effect with GaTK.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,276Administrator, GSA Member admin

    No problem at all, you can use whatever intervals you want. This may influence the expected Ti/Tv ratio, so keep that in mind when you analyze your callset, but it shouldn't have any effect on the quality of results.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.