If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Meaning of "overlapping" in CollectHsMetrics option CLIP_OVERLAPPING_READS

I had originally interpreted the meaning of "overlapping" in CollectHsMetrics option CLIP_OVERLAPPING_READS as applying to reads that overlap a target interval but extend past the end of the interval and instructing to clip (i.e., remove from the target coverage) the part not overlapping a target interval (thereby offering the option, when set false, to include the coverage from the spillover of reads beyond the target boundaries).

After looking at the code, however, it appears to actually mean to find the overlapping part of paired mates and then to return the number of bases that would need to be clipped from one of the mates to prevent the overlap. Please confirm that the latter is the correct interpretation and, if so, I recommend changing the documentation from "to clip overlapping reads" to "to clip the overlapping part of one mate so that paired mates do not overlap each other".

From the code:
The CLIP_OVERLAPPING_READS boolean argument is passed to picard.analysis.directed.TargetMetricsCollector (abstract class implemented by HsMetricCollector) which then calls htsjdk.samtools.SAMUtils.getNumOverlappingAlignedBasesToClip with doc that says

"Returns the number of bases that need to be clipped due to overlapping pairs. If the record is not paired, or the given record's start position is greater than its mate's start position, zero is automatically returned. NB: This method assumes that the record's mate is not contained within the given record's alignment."



  • SheilaSheila Broad InstituteMember, Broadie admin
    edited April 2018


    I need to confirm with the team and get back to you.


    EDIT: Looking at the Picard documentation, it seems the latter statement you made is correct. "For paired reads, soft clip the 3' end of each read if necessary so that it does not extend past the 5' end of its mate."

Sign In or Register to comment.