Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
How to detemine how many bases are there for per read group
In the Base Quality Score Recalibration page, it says that:
"A critical determinant of the quality of the recalibation is the number of observed bases and mismatches in each bin. The system will not work well on a small number of aligned reads. We usually expect well in excess of 100M bases from a next-generation DNA sequencer per read group. 1B bases yields significantly better results. "
I am not sure what the "per read group" means. For example, I have a sample S1, which have two libraries, L1 and L2, which are sequenced in four lanes of Illumina HiSeq, so how to determine how many bases are there for per read group.