Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MergeBAM RG ID and SM tags

fazulurfazulur hyderabadMember

Dear GATK team,

I have gone through the discussion on Merging lane wise bam and still I have little confusion.

Does it make difference if we have same RG ID and SM tags across all lanes as it was mentioned that MarkDuplicates will consider the library (flowcell)?

What if we have same sample lanes sequenced in more than one flowcell. How Markduplicates handled such cases? How does it affect on variant calling?

Please help me to resolve this type of cases.

Thanks In Advance
Fazulur Rehaman

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @fazulur

    Yes, MarkDuplicates takes read group information into account to remove sequencing artifacts. If you have read group information you should definitely use it. Take a look at these docs: https://software.broadinstitute.org/gatk/documentation/article?id=11015
    https://gatkforums.broadinstitute.org/gatk/discussion/6747

  • fazulurfazulur hyderabadMember

    Hi @BhanuGandham,

    Thanks a lot for useful links on MarkDuplicates. It was very helpful.

    I followed the below procedure to merge multisequence bam

    Markduplicates on individual bam
    Merge individual bams
    AddOrReplaceGroups to merged bam
    Markduplicates Merged bam

    I have little confusion in 3rd & 4th steps.
    Here is an example

    sample1_L001.bam
    sample1_L002.bam

    For individual BAM RG line will have the below
    ID=sample1_L001 lb=flowcellID SM=sample1
    ID=sample1_L002 lb=flowcellID SM=sample1

    After merging lane-wise bam, I am getting two RG lines in bam header. So I used picard AddorReplaceGroups to change RG line. Please correct me if I am wrong?

    ID=sample1 lb=flowcellID SM=sample1

    If I give samplename for both SM and ID fields, does it effect anything in MarkDuplicates on mergedbam.

    Please suggest me to resolve this issues.

    Thanks & Regards
    Fazulur Rehaman

Sign In or Register to comment.