I have sequenced each of my samples with a single library and would like to recalibrate my bams. I have a merged (with a @RG tag for each sample) bam and the original single-sample bams.

I ran BaseRecalibrator on each sample bam because running on the merged was slow. My question is, is running BaseRecalibrator independently on each sample bam the same as running it on a merged bam where each readgroup corresponds to a single sample?

  flow

    Hi Geraldine,

    Okay, so each sample is recalibrated independently of other samples.

    I started wondering about this because in our data (non-human) BQSR has dramatic effects on some samples but minor effects on others. This appears to be associated with relatedness of a sample to the reference--BQSR leads to mild reductions in qualities in samples closely related to the reference, but significant effects on more divergent samples.

    This makes some sense bc variant sites not in dbsnp will be found more often between divergent samples (and lead to over-correction).

    I know you have no experience in non-humans, but I would expect a similar phenomenon in human data.
    Have you seen a similar effect? Any thoughts?

