We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

PCR Duplicate detection on PCR-Free Libraries

aeonsimaeonsim Member ✭✭✭
edited October 2013 in Ask the GATK team

We've just started a big project in which we plan to use GATK with PCR-Free Libraries & I was curious as to what your thoughts are on using PCR Duplicate detection (MarkDuplicates) with the currently Illumina PCR-Free libraries? Internally do you still run this stage of the GATK pipeline with the new libraries?

Looking at your best practices documents I see it's still in there, but didn't see anything mentioning PCR-Free libraries.


Best Answer


  • aeonsimaeonsim Member ✭✭✭

    We ended up running the duplicate detection anyway just to be safe, but looking at the results detected duplicates was in the order of 0.01% which seems to be a fairly large time cost for little advantage.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @aeonsim Thanks for letting us know, that's an interesting observation.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @aeonsim Just want to clarify that your metrics file listed 0.0001 as the fraction of duplicates to give you the 0.01% duplicates. The column label says "PERCENT_DUPLICATION" but metrics files technically list the fraction.

    For my 2x150 PCR-free libraries of ~30x depth, I get about 13–13.5% duplicates (listed as 0.130041 to 0.135149 in the metrics file). Your PCR-free metrics compel me to consider what are acceptable duplication rates for PCR-free libraries.

    I would conjecture that by design, sequencing of duplicates is unavoidable. To reach target coverage and depth, some fraction of sequenced reads will, by stochastic means, be duplicate reads. These can arise during DNA shearing, PCR amplification, and in the sequencer.

  • Hi Shlee,

    For comparison, my 2x125 PCR-free libraries 0f ~70x depth were ~9% (listed as 0.087...). It's reassuring to know that this level of duplication is typical, but equally surprising. The assumption that PCR-free libraries are not subject to high rates of duplicates appears to be totally misguided.

    Thanks for your post!

Sign In or Register to comment.