Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

How to edit MULTIPLE read groups in one bam file

Hi everyone,

I recently received a WGS bam from Broad for 1 sample, but with about 8 read groups. BSQR kicked it back saying that the sequencer name in the read group is not recognized.

Anyways, I need to edit the sequencer name so that BSQR can run. AddReplaceReadGroups in Picard will toss out the 8 RGs and add 1 RG info, so that will not work. So how do you edit one or two of the RGs, or replace all 8 RGs in the bam?

I am sure this is a common issue.

Thanks

Tagged:

Answers

  • Here is the error msg, and current RG info:

    ERROR MESSAGE: The platform (PL) associated with read group GATKSAMReadGroupRecord @RG:SRC-7-tos1__S0231.E4.L1__half is not a recognized platform. Allowable options are ILLUMINA,SLX,SOLEXA,SOLID,454,LS454,COMPLETE,PACBIO,IONTORRENT,CAPILLARY,HELICOS,UNKNOWN

    @RG ID:SR-baku__CpgS0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SR-evorI__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SR-evorI_L2__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SR-evorII__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SR-oriI__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SR-oriI_L2__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SR-oriII__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SR-oriII_L2__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-0-390k_v6.4__S0231_E2_L1__unk PL:PL LB:LB SM:SM
    @RG ID:SRC-1-390k_v6.4__S0231_E2_L2__unk PL:PL LB:LB SM:SM
    @RG ID:SRC-2-390k_v6.4__S0231_E3_L1__unk PL:PL LB:LB SM:SM
    @RG ID:SRC-3-390k_v6.4__S0231_E4_L1__unk PL:PL LB:LB SM:SM
    @RG ID:SRC-4-tos1__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-5-tos1__S0231.E2.L2__half PL:PL LB:LB SM:SM
    @RG ID:SRC-6-tos1__S0231.E3.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-7-tos1__S0231.E4.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-8-tos2__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-9-tos4__S0231.E2.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-10-tos2__S0231.E2.L2__half PL:PL LB:LB SM:SM
    @RG ID:SRC-11-tos4__S0231.E2.L2__half PL:PL LB:LB SM:SM
    @RG ID:SRC-12-tos2__S0231.E3.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-13-tos4__S0231.E3.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-14-tos2__S0231.E4.L1__half PL:PL LB:LB SM:SM
    @RG ID:SRC-15-tos4__S0231.E4.L1__half PL:PL LB:LB SM:SM

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oy, when you say you received this from Broad, was that through Broad Genomics/ the Genomics Platform, or someone else? Our production bams shouldn't ever come out with readgroups like that.

    I don't recall off the top of my head whether AddOrReplaceReadGroups is able to apply RG transformations to individual RG within a file that has several. If it's possible it would be noted in the tool doc. If it's not then the solution is to split your bam by RG into per-RG bams, and apply the transformation there. There's a Picard tool for that; again I don't recall the exact name and can't look it up now, but it might be as simple as SplitSamFile.

  • @Geraldine_VdAuwera said:
    Oy, when you say you received this from Broad, was that through Broad Genomics/ the Genomics Platform, or someone else? Our production bams shouldn't ever come out with readgroups like that.

    I don't recall off the top of my head whether AddOrReplaceReadGroups is able to apply RG transformations to individual RG within a file that has several. If it's possible it would be noted in the tool doc. If it's not then the solution is to split your bam by RG into per-RG bams, and apply the transformation there. There's a Picard tool for that; again I don't recall the exact name and can't look it up now, but it might be as simple as SplitSamFile.

    Thanks Geraldine, but I figured it out, and sorry it was not Broad, but Harvard...

    Yes, AddorReplaceReadGroups would not work, because it would replace the 20 RGs with one.

    The BAM can be split up into 20 individual BAMs by RG, and each one edited, and then everything merged back again. I initially did that. If anyone wants to know:

    samtools split mergedBam.bam

    However, I was able to get past the problem by just re-headering my merged BAM file using sed and samtools. Here is what I used:

    samtools view -H mergedBam.bam | sed -e 's/PL:PL/PL:Illumina/' | sed -e 's/SM:SM/SM:EHG/' | samtools reheader - mergedBam.bam > mergedBam.reheadered.bam

    Here, I replaced PL:PL with PL:Illumina, and also changed the sample name from SM:SM to something more informative, SM:EHG.

    I was 1st skeptical that simply re-headering would solve the problem, thinking that the read group strings throughout the BAM need to be changed, but so far it has worked and I generated before and after plots with BQSR

  • Here is my re-headered BAM:

    @RG ID:SR-evorI__aL0230__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SR-evorI_L2__aL0230__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SR-evorII__aL0230__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SR-oriI__aL0230__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SR-oriI_L2__aL0230__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SR-oriII__aL0230__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SR-oriII_L2__aL0230__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SR-tala__aL0230__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-0-390k_v6.4__S0230__unk PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-1-390k_v6.4__S0670__unk PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-2-tos1__S0230.aL2__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-3-cor3__S0670.iL.aL1__half PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-4-cor4__S0670.iL.aL1__half PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-5-cor5__S0670.iL.aL1__half PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-6-cor__S0670.iL.aL1__half PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-7-cor2__S0670.iL.aL1__half PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-8-tos2__S0230.aL2__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-9-tos4__S0230.aL2__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-10-tos5__S0230.aL2__plus PL:Illumina LB:LB SM:EHG
    @RG ID:SRC-11-mot1__S0393.aL2__plus PL:Illumina LB:LB SM:EHG

  • mzabidimzabidi Member

    Thanks @dilawerkh4 for this tip, this is exactly what I happened to be looking for!

Sign In or Register to comment.