It looks like you're new here. If you want to get involved, click one of these buttons!
Hi,
I have 15 affected samples. 2 are whole exome and 13 are whole genome. They have already been realigned on a single-sample level and had BQSR performed. I am contemplating running UnifiedGenotyper on all 15 samples together because we would like to compare the calls across the samples (especially in the coding regions). I am aware that there would be a large number of variant calls in the whole genome samples that would have little to no coverage in the exome samples. I haven't been able to find any posts that say you should or shouldn't run whole genome and exome samples through UnifiedGenotyper together. Are there any reasons why this should be discouraged?
Also, assuming I do perform multi-sample calling across all 15 samples, would it be ok to run that multi-sample VCF file through VQSR?
Thanks! Jared
I don't know about the calling, but I would probably not run VQSR on both sets together. I would expect that the characteristics of variants at the edge of capture regions, and especially in the "splash" around the probes, would be different than you get in wgs. In turn, I would expect those differences to confuse the learning process in VQSR.
However, I haven't tried it - it may be that it works just fine. I'm actually wrestling with a similar problem right now, with a large exome project that used two different capture kits. So far, I'm treating them separately and merging the filtered variants at the end.
Answers
Hi Jared,
That's a rather different approach than what we have experience with. Classically we would call the whole genomes and the exomes separately, then compare callsets with the variant evaluation tools.
As we've never tried the approach you suggest, I can't really comment one way or the other, except to say there is no major obstacle to doing it that I can think of. If you try this, please do let us know how it turns out, so we can share the merits or drawbacks with the user community. Thanks!
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks for your answer. I ended up separating the exome and whole genome samples before performing multi-sample calling so I wouldn't risk confusing VQSR afterwards. I also had another project where I was working with exome samples from 2 different versions of a capture kit. I separated those as well for the same reason.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •