BQSR - Readgroups

Hi Team,

I have a pooled dataset with 95 individuals on one lane.
This I have in 95 files, having each unique readgroups like this:

@RG     ID:TGCCATG SM:TGCCATG PL:ILLUMINA     LB:LB   PU:LB_1
@RG     ID:ACCTGAT SM:ACCTGAT PL:ILLUMINA      LB:LB   PU:LB_1
[...]

I ran AddOrReplaceReadgroups on these sets, so I had readgroups like this:

@RG     ID:LB_1 SM:MIX PL:ILLUMINA     LB:LB   PU:LB_1
@RG     ID:LB_1 SM:MIX PL:ILLUMINA      LB:LB   PU:LB_1
[...]

Then I ran BQSR.
1. All original files together by using multiple times --input_file
2. All files with modified RG.

In the log I get:
INFO 21:01:44,195 SAMDataSource$SAMReaders - Init 50 BAMs in last 0.32 s, 50 of 95 in 0.32 s / 0.01 m (154.11 tasks/s). 45 remaining with est. completion in 0.29 s / 0.00 m

Surprisingly after running the second Recalibration and Report generation like in the best practices,
I get the EXACT same results (PDF)! The only thing that is different is the timestamp on the first page ;)

On the page 'Overall error rates by event type' it states the ReadGroup LB_1 for both runs.

Did I miss that BQSR is not RG sensitive anymore, but PU sensitive?

Best,
Alexander

Comments

Sign In or Register to comment.