Remove duplicate reads from ChIP-seq data

blueskys123blueskys123 TaiwanMember Posts: 2

In general, the reads un-mapped to reference, multiple-alignment, and duplicated, should be removed after mapping to reference in ChIP-seq pipeline.

The source of PCR duplicated includes "library/PCR-generated duplicates (LB)" and "sequencing-platform artifact duplicates (SQ)". Of course, SQ must be removed in ChIP-seq analysis. However, the truely biological signal of duplicated reads may be contained in LB. Therefore, if I want to remove duplicated reads, what type of duplicated reads should I remove? Both SQ and LB or SQ only ? (My ChIP-seq data is single end (read length=36bp).)

If I use "REMOVE_DUPLICATES=true", are the reads of SQ and LB removed?

Is the below parameter right if I want to remove SQ only?

Thank you.



  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,184 admin

    In general I would recommend removing all duplicate reads. However we do not have any experience with ChIP-seq so we cannot advise you on this point. If you want to remove SQ dupes only then yes, those are the correct parameters.

    Geraldine Van der Auwera, PhD

