Effect of running BQSR per sample and not lane

jmhjmh EdinburghMember

What is the effect of running BQSR per sample across lanes and not per sample per lane i.e if @RG ID is per sample and not per sample lane. Does it introduce additional errors or just run BQSR sub-optimally? If So, how so?
Many Thanks

Best Answers


  • jmhjmh EdinburghMember

    Thanks for this, I have a follow up question. Are variants generated to boostrap the BQSR still valid if MarkDuplicates was run per combined sample? As I understand it MarkDups and IndelRealignment don't use RGID but work per Library and all the data. So If I ran GATK In the following manner:

    1. Align reads.
    2. Combine bams and markduplicates with picard
    3. Perform Indel realignment
    4. Initial run of haplotypecaller for bootstrap variants

    Then I can subsequently run BQSR using these variants as RGID wasn't used in these steps on reconstituted BAMs with read RGID correctly reflecting sample per lane. This means I don't have to run Haplotypecaller twice.
    Thanks for the help.

Sign In or Register to comment.