To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

BQSR - Readgroups

Hi Team,

I have a pooled dataset with 95 individuals on one lane.
This I have in 95 files, having each unique readgroups like this:

@RG     ID:TGCCATG SM:TGCCATG PL:ILLUMINA     LB:LB   PU:LB_1
@RG     ID:ACCTGAT SM:ACCTGAT PL:ILLUMINA      LB:LB   PU:LB_1
[...]

I ran AddOrReplaceReadgroups on these sets, so I had readgroups like this:

@RG     ID:LB_1 SM:MIX PL:ILLUMINA     LB:LB   PU:LB_1
@RG     ID:LB_1 SM:MIX PL:ILLUMINA      LB:LB   PU:LB_1
[...]

Then I ran BQSR.
1. All original files together by using multiple times --input_file
2. All files with modified RG.

In the log I get:
INFO 21:01:44,195 SAMDataSource$SAMReaders - Init 50 BAMs in last 0.32 s, 50 of 95 in 0.32 s / 0.01 m (154.11 tasks/s). 45 remaining with est. completion in 0.29 s / 0.00 m

Surprisingly after running the second Recalibration and Report generation like in the best practices,
I get the EXACT same results (PDF)! The only thing that is different is the timestamp on the first page ;)

On the page 'Overall error rates by event type' it states the ReadGroup LB_1 for both runs.

Did I miss that BQSR is not RG sensitive anymore, but PU sensitive?

Best,
Alexander

Comments

Sign In or Register to comment.