Read groups names in a BAM file with multiple lanes

Hi,

I just downloaded a few BAM files from the GDC data portal. some of them have numerous lines for their read groups information, typically the exome data. when I look for the information with

samtools view -H C484.TCGA-06-5411-10A-01D-1696-08.3_gdc_realn.bam | grep '^@RG'

I end up with this kind of output :

@RG ID:C01PR.1 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.1.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.2 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.2.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.3 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.3.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.4 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.4.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.5 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.5.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.6 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.6.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.7 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.7.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01PR.8 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01PRACXX110628.8.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01RD.1 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01RDACXX110628.1.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01RD.2 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01RDACXX110628.2.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01RD.3 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01RDACXX110628.3.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
@RG ID:C01RD.4 CN:BI DT:2011-06-28T00:00:00+0000 LB:Catch-70115 PL:illumina PU:C01RDACXX110628.4.TGCTGCTG SM:TCGA-06-5411-10A-01D-1696-08
...

My question is : Is it okay to let the RG like that to use GATK MuTect2 for Variant Calling ? I mean is there no problem with the fact that there is many different IDs in one sample ? Or should I use the AddOrReplaceReadGroup tool to set only one ID ?

Thank you very much ! Regards,

Alexandre Coudray

Best Answer

Answers

Sign In or Register to comment.