The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Calling variants on whole exome and whole genome samples together?

Member Posts: 6
edited January 2013

Hi,

I have 15 affected samples. 2 are whole exome and 13 are whole genome. They have already been realigned on a single-sample level and had BQSR performed. I am contemplating running UnifiedGenotyper on all 15 samples together because we would like to compare the calls across the samples (especially in the coding regions). I am aware that there would be a large number of variant calls in the whole genome samples that would have little to no coverage in the exome samples. I haven't been able to find any posts that say you should or shouldn't run whole genome and exome samples through UnifiedGenotyper together. Are there any reasons why this should be discouraged?

Also, assuming I do perform multi-sample calling across all 15 samples, would it be ok to run that multi-sample VCF file through VQSR?

Thanks!
Jared

Post edited by Geraldine_VdAuwera on
Tagged:

Hi Jared,

That's a rather different approach than what we have experience with. Classically we would call the whole genomes and the exomes separately, then compare callsets with the variant evaluation tools.

As we've never tried the approach you suggest, I can't really comment one way or the other, except to say there is no major obstacle to doing it that I can think of. If you try this, please do let us know how it turns out, so we can share the merits or drawbacks with the user community. Thanks!

Geraldine Van der Auwera, PhD

• Member Posts: 6

Thanks for your answer. I ended up separating the exome and whole genome samples before performing multi-sample calling so I wouldn't risk confusing VQSR afterwards. I also had another project where I was working with exome samples from 2 different versions of a capture kit. I separated those as well for the same reason.