We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

What are the differences in using default and custom interval list (-L) for exome seq?

I mean to say default interval list (-L) as exome region and custom interval list as designed and given by the company like agilent or illumina. What will the difference in the output generated using custom to default (-L)? Will it be the same? Are we using custom interval list to fasten the process?

Best Answers

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Using whatever bed file is up to you. I usually do not go with company's recommendations only. I combine bed file from the company with Exon regions downloaded from the table browser of UCSC. Use Bedtools to merge and combine the regions for a better coverage. Especially if you are working with HG38 uplifting bed files is a mess. Go with the UCSC ensemble gene or refgene table to define your exonic regions.

  • snijeshsnijesh hosurMember
    edited October 2017

    Thank you @SkyWarrior
    If I choose default exonic interval list (from ucsc) I am not going to loose any variant information in the exonic region. Am I right?

  • snijeshsnijesh hosurMember

    @Sheila @SkyWarrior
    I have same question in research gate and I got following answer. I found it very interesting. What is your opinion on this?
    "There area couple of answers on the linked thread but speaking from experience, I would always use the interval file that was used by the company to create the library.
    When you perform your alignments, if you include regions that were not used in the design of the library, i.e. if you use someone elses interval file that includes regions that your capture wasn't designed to capture but did so anyway, then you are including an unknown in your downstream work. You might have captured that region by accident, and this won't be reproducible across your samples. So if you end up looking for copy number variable regions, this region may show up, as it didn't capture efficiently across your samples.
    This is just one example of how using regions not defined in your particular library might affect your work. In a practical sense you have to consider these factors."

    Written by Sophie Sneddon

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Eventually it all comes down to what you are looking for in practice. I have seen bed files provided by manufacturers not representing all captured regions properly in the past. Yet I have seen manufacturers not being able to provide proper uplifted regions for exome capture kits for hg38 as well. Majority of the kits are still being produced for hg19 but for hg38 uplifting that bed file results in loss of regions also. I am not afraid of capturing more false positives than loosing true positives due to CNVs not covered by bed files.

    This is totally my belief BTW.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @snijesh
    Hi,

    As I said above, we recommend using the interval list from the company. Of course, it is up to you to decide which is best for your own purposes. It seems @SkyWarrior has reasons for using a mix of interval files, but we support (as Best Practice) using the company-given interval list.

    -Sheila

  • And when you have samples to which different capture kits were used?
    And when doing the HaplotypeCaller for my own samples plus 1Kgenomes samples (to reach a minimum of 30 samples) should I restrict myself to my capture kit bed file?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hey again @Rosmaninho. Sheila's moved on to green pastures. Do you have questions on GATK/Picard tools still? Since the original question in this thread has been answered, let me just comment briefly on your question that if you have samples to which different capture kits were used, I have in the past (for a GATK4.alpha CNV tutorial that compare CNV PoNs) used an intersection of intervals to create a conservative intervals list towards coverage counts. I think how you process your intervals will depend on your application.

  • Thank you shlee! (by the way, do you have a first name for which I can call you by or is shlee ok?)

    I did exactly that. I tested two lists, a merge and an intersect (merge using bedops and intersect using bedtools) and indeed the more conservative list yielded a much lower number of FPs so that's what used in my analysis.

    I found a topic where Geraldine replied to with the same advice that you now gave me, so that solidifies that I made the right choice. :)

    what gave me the best results was an intersect list, interval-padding of 50bp and max-gaussians 6. Those were the parameters that gave me the best results I could get.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @Rosmaninho, my first name is 'Soo Hee' (pronounced 'so hee'). You can call me that or shlee, up to you. 😁

    Thanks for sharing what gives you the best results. Hopefully, it will be helpful to other members of the community.

Sign In or Register to comment.