Remove duplicate reads from ChIP-seq data
In general, the reads un-mapped to reference, multiple-alignment, and duplicated, should be removed after mapping to reference in ChIP-seq pipeline.
The source of PCR duplicated includes "library/PCR-generated duplicates (LB)" and "sequencing-platform artifact duplicates (SQ)". Of course, SQ must be removed in ChIP-seq analysis. However, the truely biological signal of duplicated reads may be contained in LB. Therefore, if I want to remove duplicated reads, what type of duplicated reads should I remove? Both SQ and LB or SQ only ? (My ChIP-seq data is single end (read length=36bp).)
If I use "REMOVE_DUPLICATES=true", are the reads of SQ and LB removed?
Is the below parameter right if I want to remove SQ only?