Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Bait bias

Bait bias (single bait bias or reference bias artifact) is a type of artifact that affects data generated through hybrid selection methods.
These artifacts occur during or after the target selection step, and correlate with substitution rates that are biased or higher for sites having one base on the reference/positive strand relative to sites having the complementary base on that strand. For example, a G>T artifact during the target selection step might result in a higher (G>T)/(C>A) substitution rate at sites with a G on the positive strand (and C on the negative), relative to sites with the flip (C positive)/(G negative). This is known as the "G-Ref" artifact.
Post edited by dekling on
Tagged:
Comments
I think we are seeing this type of artifact in some of our exome sequencing data, but I'm a little unclear on what the actual cause is. What causes the substitution during the target selection step?
Issue · Github
by Sheila
Don't hold my feet to the fire on this, but I believe the errors are introduced to the bait from sample handling. Essentially, guanines on the bait sequence are sensitive to oxidation from extraction agents, heat, etc. This can cause some guanine nucleotides to become 8-oxoguanine (8-OxoG, OxoG) nucleotides. These modified guanines can basepair with T instead of C as would normally be expected. Thus, during PCR, this error is propagated. Since the G is sensitive to oxidation, you will likely see a higher frequency of G ->A then C->T. Is this helpful?
Circling back around on this because we are seeing this happen again, and I don't feel like I ever got a clear answer on what causes this. And I can't find much in the literature about it. Is the "G-ref" artifact caused by damage to the capture probes/baits?? The description says it can happen "during or after the target selection step". And that a "G>T artifact during the target selection step" can cause it. But that doesn't really explain in my mind when and how the artifact is being introduced.
And I should clarify-what we are seeing are definitely G>T artifacts, not G>A/C>T, which are OxoG artifacts. Picard is flagging these samples as having low qscores for baitbias/G>T changes, too. So I think we are seeing this artifact, I just don't understand the origin of the artifact.
After reading the pre-adapter bias documentation, it looks like oxidative damage can show up as G>T or C>A changes as well.
So, how do you know if elevated G>T rates are OxoG artifacts, or G-ref artifacts, and what's the difference?