BQSR on 24-multiplexed human exomes: how much data is needed for an accurate BQSR model estimation?

palmeirapalmeira LiegeMember ✭✭

Dear Geraldine and GATK experts,

I have attended the great Brussels workshop, and I am posting here the BQSR question I had.

I have a human exome experiment with 24-multiplexed samples per lane (Nextera libraries) where we first only did one lane of sequencing (~15x) and then added a second lane (summing up to ~30x). From what I understood reading the Best Practices, I probably don't have enough data to run a BQSR on each sample. How should I then do the BQSR step? Should I skip it altogether? Could I estimate the model parameters on one whole lane (all the samples) and then apply it separately to each sample?

And as a separate question: If I could turn back the clock, would it have been better to do 12-multiplexed samples per lane and run these two lanes of sequencing (24 samples in total) for the same amount of reads but giving me more data to do a BQSR step per sample?

Thanks a lot for your help!

Best Answers


  • palmeirapalmeira LiegeMember ✭✭
    edited June 2014

    If I merge all the samples BAMs of one lane for the BQSR step, would I have to change the RG to have all samples in a given lane have the same RG? or is BQSR lane-aware and it will fetch the correct lane information?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    No, you do not have to change the rgid. BQSR will take care of everything.


  • palmeirapalmeira LiegeMember ✭✭

    Ok, that's what I thought. Thanks a lot!

