Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

BQSR - Readgroups

Hi Team,

I have a pooled dataset with 95 individuals on one lane.
This I have in 95 files, having each unique readgroups like this:

@RG     ID:TGCCATG SM:TGCCATG PL:ILLUMINA     LB:LB   PU:LB_1
@RG     ID:ACCTGAT SM:ACCTGAT PL:ILLUMINA      LB:LB   PU:LB_1
[...]

I ran AddOrReplaceReadgroups on these sets, so I had readgroups like this:

@RG     ID:LB_1 SM:MIX PL:ILLUMINA     LB:LB   PU:LB_1
@RG     ID:LB_1 SM:MIX PL:ILLUMINA      LB:LB   PU:LB_1
[...]

Then I ran BQSR.
1. All original files together by using multiple times --input_file
2. All files with modified RG.

In the log I get:
INFO 21:01:44,195 SAMDataSource$SAMReaders - Init 50 BAMs in last 0.32 s, 50 of 95 in 0.32 s / 0.01 m (154.11 tasks/s). 45 remaining with est. completion in 0.29 s / 0.00 m

Surprisingly after running the second Recalibration and Report generation like in the best practices,
I get the EXACT same results (PDF)! The only thing that is different is the timestamp on the first page ;)

On the page 'Overall error rates by event type' it states the ReadGroup LB_1 for both runs.

Did I miss that BQSR is not RG sensitive anymore, but PU sensitive?

Best,
Alexander

Comments

Sign In or Register to comment.