2 read libraries: same genome, different DNA samples, different machine runs. When to merge?

redzengenoistredzengenoist Member
edited May 2013 in Ask the GATK team

Hello Geraldine,

I have some genomes where each person has given two DNA samples. Each has been purified separately, each run on a separate lane, often on a separate (Illumina) machine. Two genomes to be fused into one, basically.

I'm following the GATK Best Prac, and I'm considering at what point I should merge the BAMs. I'm thinking that it's clear enough that they should merge after Picard addReads, and before variant calling, but I want input on whether to do it before or after GATK BaseRecalibrator and BQSR. As for dedupping and indel-realignment, I expect it makes little difference whether you merge before or after.

What do you think is the best time to merge the BAMs? Do you have a clever Best Practices on merging distinct read libraries on the same genome? I'm guessing that this is not the first time you've encountered this issue, though I'm not able to find a similar topic.

Thanks, Geraldine.

Best Answer

Answers

  • CarneiroCarneiro Charlestown, MAMember admin

    MarkDuplicates will do a better job if merged
    IndelRealignment will do a better job if merged
    BaseRecalibrator (which is what we call BQSR btw) will recalibrate them independently because they came from different runs, so no benefit in merging -- in fact, it will use more memory to run on a merged bam.

    None of these are 'big deals'. You can merge before and eat the memory cost and have the best possible results, or you can do it after and probably the results won't be so different (if the runs are well covered independently). If the coverage is low, then definitely merge them upfront.

    Make sure your @RG tag denotes the distinction between the two runs, otherwise BQSR will think they're all the same.

  • redzengenoistredzengenoist Member

    That makes sense, Carneiro.

    So I guess the only thing to do before merging is actually Picardtools AddOrReplaceReadGroups.jar, right? In fact, I should not even sort right?

    Here's what I'm thinking now:

    1) BWA align
    2) samtools view -bS into BAM
    3) AddOrReplaceReadGroups
    4) samtools sort
    5) dedup, indelrealign, recalibrate, etc etc etc

    What do you think, Carneiro?

  • redzengenoistredzengenoist Member

    Excellent. Thank you Carneiro.

Sign In or Register to comment.