Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
BQSR - Readgroups
I have a pooled dataset with 95 individuals on one lane.
This I have in 95 files, having each unique readgroups like this:
@RG ID:TGCCATG SM:TGCCATG PL:ILLUMINA LB:LB PU:LB_1 @RG ID:ACCTGAT SM:ACCTGAT PL:ILLUMINA LB:LB PU:LB_1 [...]
I ran AddOrReplaceReadgroups on these sets, so I had readgroups like this:
@RG ID:LB_1 SM:MIX PL:ILLUMINA LB:LB PU:LB_1 @RG ID:LB_1 SM:MIX PL:ILLUMINA LB:LB PU:LB_1 [...]
Then I ran BQSR.
1. All original files together by using multiple times
2. All files with modified RG.
In the log I get:
INFO 21:01:44,195 SAMDataSource$SAMReaders - Init 50 BAMs in last 0.32 s, 50 of 95 in 0.32 s / 0.01 m (154.11 tasks/s). 45 remaining with est. completion in 0.29 s / 0.00 m
Surprisingly after running the second Recalibration and Report generation like in the best practices,
I get the EXACT same results (PDF)! The only thing that is different is the timestamp on the first page
On the page 'Overall error rates by event type' it states the ReadGroup
LB_1 for both runs.
Did I miss that BQSR is not
RG sensitive anymore, but