If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.
Read group ID and PU setup
In the GATK forum there are many confusing discussions about the difference between read group ID and PU (platform unit) and how to set this up prior to analysis.
The reason for the confusion is because the gatk FAQ and most of the examples assume that only one sample is run on each lane. However, in many situations, multiple samples are run in the same lane (multiplexed). This issue has led to numerous confusing gatk forum discussions over the past few years.
So to clarify for myself and others who are still unsure, when multiple samples are multiplexed on the same lane, should the reads of each sample in the same lane have:
- The same or different read group IDs?
- The same or different read group PUs?
This will affect the BQSR step mainly. PU takes precedence over ID at that step.
So the answers to these questions will determine whether BQSR will use all reads in a lane regardless of the sample (even if multiple samples were sequenced on the same lane) OR whether BQSR will use only reads in a lane from the sample being analyzed.
In other words, should BQSR run per sample-lane, or just per lane?