Meaning of "overlapping" in CollectHsMetrics option CLIP_OVERLAPPING_READS

I had originally interpreted the meaning of "overlapping" in CollectHsMetrics option CLIP_OVERLAPPING_READS as applying to reads that overlap a target interval but extend past the end of the interval and instructing to clip (i.e., remove from the target coverage) the part not overlapping a target interval (thereby offering the option, when set false, to include the coverage from the spillover of reads beyond the target boundaries).

After looking at the code, however, it appears to actually mean to find the overlapping part of paired mates and then to return the number of bases that would need to be clipped from one of the mates to prevent the overlap. Please confirm that the latter is the correct interpretation and, if so, I recommend changing the documentation from "to clip overlapping reads" to "to clip the overlapping part of one mate so that paired mates do not overlap each other".

From the code:
The CLIP_OVERLAPPING_READS boolean argument is passed to picard.analysis.directed.TargetMetricsCollector (abstract class implemented by HsMetricCollector) which then calls htsjdk.samtools.SAMUtils.getNumOverlappingAlignedBasesToClip with doc that says

"Returns the number of bases that need to be clipped due to overlapping pairs. If the record is not paired, or the given record's start position is greater than its mate's start position, zero is automatically returned. NB: This method assumes that the record's mate is not contained within the given record's alignment."

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited April 2018

    @billscaringe
    Hi,

    I need to confirm with the team and get back to you.

    -Sheila

    EDIT: Looking at the Picard documentation, it seems the latter statement you made is correct. "For paired reads, soft clip the 3' end of each read if necessary so that it does not extend past the 5' end of its mate."

Sign In or Register to comment.