The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

Picard 2.10.2 is now available at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.

# UnifiedGenotyper doesn't generate 1 vcf per sample when bams from multiple subjects are input

Member

we are running tests trying to get UG to produce 1 vcf per sample when inputting bams from multiple subjects. our situation is complicated slightly by the fact that each sample has 3 bams. when we input all 6 bams into UG, hoping to output 2 vcfs (1 per sample) we instead get a single vcf. we found some relevant advice in this post:
but still haven't solved the issue.

details include: 1) we are inputting 6 bams for our test, 3 per sample for 2 samples. 2) bams were generated using Bioscope from targeted capture reads sequenced on a Solid 4. 3) as recommended in the post above we checked out the @RG statements in the bam headers using Samtools -- lines for the 6 bams are as follows:

sample 1:

@RG ID:20130610202026358 PL:SOLiD PU:bioscope-pairing LB:75x35RR PI:148 DT:2013-06-10T16:20:26-0400 SM:S1

@RG ID:20130611214013844 PL:SOLiD PU:bioscope-pairing LB:75x35RR PI:148 DT:2013-06-11T17:40:13-0400 SM:S1

@RG ID:20130613002511879 PL:SOLiD PU:bioscope-pairing LB:75x35RR PI:147 DT:2013-06-12T20:25:11-0400 SM:S1

sample 2:

@RG ID:20130611021848236 PL:SOLiD PU:bioscope-pairing LB:75x35RR PI:151 DT:2013-06-10T22:18:48-0400 SM:S1

@RG ID:20130612014345277 PL:SOLiD PU:bioscope-pairing LB:75x35RR PI:151 DT:2013-06-11T21:43:45-0400 SM:S1

@RG ID:20130613085411753 PL:SOLiD PU:bioscope-pairing LB:75x35RR PI:150 DT:2013-06-13T04:54:11-0400 SM:S1

Based on the former post, I would have expected each of these bams to generate a separate vcf as it appears the ids are all different (which would not have been desirable either, as we are hoping to generate 2 vcfs in this test). Thus, it is not clear if/how we should use Picard tool AddOrReplaceReadGroups to modify the @RG headers?

Does that make sense? Any advice?

Tagged:

Hi there,

I think we have a small misunderstanding here. The way you're doing it is correct in the sense that your RG groups are fine, and the UG is treating your samples separately. However, the UG will always output the results of samples that were called together into a single VCF file. This doesn't mean that the sample data got lumped all together. If you look at your VCF, you should see that there are per-sample metrics reported, with separate values for each sample.

If you want the results reported separately because the samples are unrelated, then you need to call the samples separately in distinct runs of the UG. If the samples are related (in the sense that they're part of a study cohort), then I would recommend calling them together, and if you really want to you can separate out the calls using SelectVariants to produce per-sample VCFs.

• Member

Many thanks for the response and apologies for my misunderstanding -- I am obviously a newbie. To briefly follow-up with another newbie question: here is the first line of (non-meta-data) output from the vcf (the line after moves to a different locus):

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1