Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Read groups names in a BAM file with multiple lanes

Hi,

I just downloaded a few BAM files from the GDC data portal. some of them have numerous lines for their read groups information, typically the exome data. when I look for the information with

samtools view -H C484.TCGA-06-5411-10A-01D-1696-08.3_gdc_realn.bam | grep '^@RG'

I end up with this kind of output :

@RG ID:C01PR.1 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.1.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.2 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.2.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.3 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.3.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.4 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.4.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.5 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.5.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.6 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.6.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.7 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.7.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.8 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.8.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01RD.1 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01RDACXX110628.1.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01RD.2 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01RDACXX110628.2.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01RD.3 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01RDACXX110628.3.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01RD.4 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01RDACXX110628.4.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
...

My question is : Is it okay to let the RG like that to use GATK MuTect2 for Variant Calling ? I mean is there no problem with the fact that there is many different IDs in one sample ? Or should I use the AddOrReplaceReadGroup tool to set only one ID ?

Thank you very much ! Regards,

Alexandre Coudray

Best Answer

Answers

Sign In or Register to comment.