Remove duplicate reads from ChIP-seq data

In general, the reads un-mapped to reference, multiple-alignment, and duplicated, should be removed after mapping to reference in ChIP-seq pipeline.

The source of PCR duplicated includes "library/PCR-generated duplicates (LB)" and "sequencing-platform artifact duplicates (SQ)". Of course, SQ must be removed in ChIP-seq analysis. However, the truely biological signal of duplicated reads may be contained in LB. Therefore, if I want to remove duplicated reads, what type of duplicated reads should I remove? Both SQ and LB or SQ only ? (My ChIP-seq data is single end (read length=36bp).)

If I use "REMOVE_DUPLICATES=true", are the reads of SQ and LB removed?

Is the below parameter right if I want to remove SQ only?
TAGGING_POLICY=OpticalOnly, REMOVE_DUPLICATES=true

Thank you.

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    In general I would recommend removing all duplicate reads. However we do not have any experience with ChIP-seq so we cannot advise you on this point. If you want to remove SQ dupes only then yes, those are the correct parameters.

Sign In or Register to comment.