Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Meaning of "overlapping" in CollectHsMetrics option CLIP_OVERLAPPING_READS
I had originally interpreted the meaning of "overlapping" in CollectHsMetrics option CLIP_OVERLAPPING_READS as applying to reads that overlap a target interval but extend past the end of the interval and instructing to clip (i.e., remove from the target coverage) the part not overlapping a target interval (thereby offering the option, when set false, to include the coverage from the spillover of reads beyond the target boundaries).
After looking at the code, however, it appears to actually mean to find the overlapping part of paired mates and then to return the number of bases that would need to be clipped from one of the mates to prevent the overlap. Please confirm that the latter is the correct interpretation and, if so, I recommend changing the documentation from "to clip overlapping reads" to "to clip the overlapping part of one mate so that paired mates do not overlap each other".
From the code:
The CLIP_OVERLAPPING_READS boolean argument is passed to picard.analysis.directed.TargetMetricsCollector (abstract class implemented by HsMetricCollector) which then calls htsjdk.samtools.SAMUtils.getNumOverlappingAlignedBasesToClip with doc that says
"Returns the number of bases that need to be clipped due to overlapping pairs. If the record is not paired, or the given record's start position is greater than its mate's start position, zero is automatically returned. NB: This method assumes that the record's mate is not contained within the given record's alignment."