How to detemine how many bases are there for per read group

In the Base Quality Score Recalibration page, it says that:

"A critical determinant of the quality of the recalibation is the number of observed bases and mismatches in each bin. The system will not work well on a small number of aligned reads. We usually expect well in excess of 100M bases from a next-generation DNA sequencer per read group. 1B bases yields significantly better results. "

I am not sure what the "per read group" means. For example, I have a sample S1, which have two libraries, L1 and L2, which are sequenced in four lanes of Illumina HiSeq, so how to determine how many bases are there for per read group.



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Read groups are the basic unit of sequencing processed by the sequencing machine (typically it is whatever was run on the same flowcell). If you look at the header of your bam file, there is a list of read groups. You can check it to see if your samples and libraries are divided into different read groups or not.

Sign In or Register to comment.