This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Read group ID and PU setup
In the GATK forum there are many confusing discussions about the difference between read group ID and PU (platform unit) and how to set this up prior to analysis.
The reason for the confusion is because the gatk FAQ and most of the examples assume that only one sample is run on each lane. However, in many situations, multiple samples are run in the same lane (multiplexed). This issue has led to numerous confusing gatk forum discussions over the past few years.
So to clarify for myself and others who are still unsure, when multiple samples are multiplexed on the same lane, should the reads of each sample in the same lane have:
- The same or different read group IDs?
- The same or different read group PUs?
This will affect the BQSR step mainly. PU takes precedence over ID at that step.
So the answers to these questions will determine whether BQSR will use all reads in a lane regardless of the sample (even if multiple samples were sequenced on the same lane) OR whether BQSR will use only reads in a lane from the sample being analyzed.
In other words, should BQSR run per sample-lane, or just per lane?