BaitBias Artifact Filter

Hi GATK team,

We recently have a set of targeted sequencing samples showing inflated G>T false positive variants. Picard CollectSequencingArtifactMetrics showed that almost all samples in the batch is having very low bait bias qscore(under 30) for G to T base change.

I read through the picard document about BaitBiasSummaryMetrics and PreAdapterSummaryMetrics. They used G>T example for both types of artifact. It gave me a not very clear impression that BaitBias is looking for bias between reference and complementary while PreAdapterSummaryMetrics is looking for an orientation bias?

Can I have a more understandable explanation of how to differentiate elevated G>T rates that are OxoG artifacts, or G-ref artifacts?

Also, GATK4 has an experimental tool FilterByOrientationBias(
to filter out OxoG artifact. I'm wondering is there a similar tool that can potentially remove G-ref artifact given SequencingArtifactMetrics.BaitBiasDetailMetrics?



Best Answer


  • LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭

    @wenluo Before I try to answer your question, are you using Mutect2?

  • wenluowenluo Member

    Hi Lee,

    Yes, we are using Mutect2 together with 3 other callers, Muse, Strelka, LofReq.
    After discussing with my coworkers last week, we kind of had some understanding of the mechanism of PreAdapterArtifact. These artifacts, like the Oxidated G, will most likely occur on one strand of the double strands. After the DNA got fragmented, these DNA single strand fragments will be read from both end from 5' to 3'. If the OxG is on reference strand, the resulted G>T variant will be only found in F1R2, if the OxG is on reference complimentary strand, the resulted C>A variant will only be found in F1R2. That's the reads orientation bias. And because it only happened to one of the double strand, unlike the real mutation, which happens to both strand, these artifacts usually has a very low allele fraction.
    For Baitbias artifact, I'm still not very clear how it formed. But according to the picard explanation, it will occur only on reference strand or reference complimentary strand, which will not be reflected on vcf files.
    Both kinds of artifact usually has pretty low allele fraction. It can serve as a not perfect way of filtering because there can be real signals with low allele fractions as well.

    I'd like to have some idea of how other people deal with these two types of artifacts. And possibly some corrections on my thoughts.


  • wenluowenluo Member

    Thanks David. That's just what I need. I'll update you once I get to try it.

