We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Joint genotyping different caputre kits

Hello, I have ~35 exomes which I want to analyze following your best practices. Because 3 different capture kits were used to generate these data, I am wondering which is the right thing to do:

Option 1. Run the entire pipeline using a interval list with the union of the three different intervals.

Option 2. Run samples in three different batches (one for each capture kit used) and generate gVCF files. After this, do the joint genotyping and VSQR in all samples together.

Thanks a lot for your help,

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Option #1 is fine; just keep in mind when evaluating results that regions outside of the intersection are expected to contain no-calls in some samples.

  • KlausNZKlausNZ Member ✭✭

    Hi Geraldine,
    Re option 1, no-calls are fine, but I wonder about the the situation where intervals are targeted by one capture kit, but contain mapped reads in samples that do not target this interval in (on-target for one kit, off-target for the other two kits in simonsanchez' example).
    What will happen when these intervals are jointly genotyped, and the tool encounters a situation of 'normal' (on-target) and extremely low (off-target) coverage at the same locus?
    We're employing option 2 to avoid this risk (and because it fits best with our incremental workflow), but haven't found time to investigate systematically. Can you share your experience or thoughts?
    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I think you can expect sites with low off-target coverage to yield low confidence genotypes in those samples, but it shouldn't affect the genotypes of the samples that have good coverage. You would of course want to make sure the evaluation of results takes into account what was targeted in what samples.

  • KlausNZKlausNZ Member ✭✭

    Thanks Geraldine, good to hear that the others shouldn't be affected!

  • simonsanchezjsimonsanchezj GermanyMember

    Thanks Geraldine for your help. I will proceed as recommended. One last question: do you recommend using the "intersect" or the "union" of the different capture kits? In my humble opinion, intersecting the bed files make more sense, so that I focus only on those variants potentially called in all samples. However, I see that many groups are doing it with union. Again, thanks a lot for your help in this and other questions I raised.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @simonsanchezj‌ I would say it depends on your cohort sizes. If you have small batches per kit, it makes most sense to use the intersection, because there's not much you can do with onesie-twosie sites in a handful of samples. But if you have large cohorts (hundreds or even thousands of samples per kit) then you can still get a lot of utility out of the unique regions per batch.

Sign In or Register to comment.