To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

ReCapSeg Overview

LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev
edited March 2 in ReCapSeg Documentation


ReCapSeg is a copy-number variant detector that runs on user-defined target regions, which can correspond to exomes, gene panels, or arbitrary windows. ReCapSeg uses a Panel of Normal (PoN) samples to model noise and normalize the coverage calls of the target sample. These methods were designed for copy number calling (amplification and deletion) of somatic events in a resolution of two or more targets.
ReCapSeg does not need a matched normal, but operates on a panel of normal samples representing similar library preparation to agnostically remove noise. ReCapSeg is the production version of the CapSeg algorithm.

Method flow

Given an DNA-Seq alignment BAM file, and a BED file of target genomic segments the ReCapSeq algorithms will estimate copy ratio and classify segments as amplified, deleted or copy neutral by performing 5 main steps:
1. Generate proportional coverage: First the per sample normalized coverage is calculated by normalizing read coverage spanning a target segment with the total number of aligned reads (for every read group: number of reads over segment/total number of aligned reads). The proportional coverage is then calculated by normalizing every segment with the median normalized coverage across the PoN for the given segment.
2. Tangent normalization: this normalization procedures project the sample proportional coverage to a hyper-plane defined by the PoN. This normalization procedure results with a copy ratio estimate with reduced noise. The tool provides an estimate called QC, which is the median of the absolute value of the difference between adjacent targets that helps the user to evaluate how noisy are the calls (lower QC is better). A good normalization procedure will show a reduction in the QC level when comparing QC before PoN tangent normalization to the QC level after normalization.
3. Segment: the target regions are then merged into continuous segments that represent the same copy number event. The segmentation is performed by a circular binary segmentation algorithm described by Olshen et al., 2004 [1] that was originally developed to segment noisy array copy number data. Currently, ReCapSeg considers only segments that include two or more targets (a target usually represents a single exon).
4. Call: each segment is called as neutral, amplified, or deleted based on the copy ratio estimates of the targets included in every segment.
5. Annotate gene segments with Oncotator: as a last step Oncotator, a somatic variant annotation tool [2], breaks every segment into gene segments such that every gene segment includes only targets of one gene. It then annotates the gene segment with the quantitative copy ratio estimate and the boundaries of the segment (e.g. “2+” represent that the segment includes exon 2 and on) based on the original segment call.

Please see the README.rst file in the source code for instructions on installing and running ReCapSeg.


[1] Circular binary segmentation for the analysis of array-based DNA copy number data. Olshen AB, Venkatraman ES, Lucito R, Wigler M. Biostatistics. 2004 Oct;5(4):557-72.

[2] Ramos AH, Lichtenstein L, Gupta M, Lawrence MS, Pugh TJ, Saksena G, Meyerson M, Getz G. Oncotator:cancer variant annotation tool. Hum Mutat. 2015 Apr;36(4):E2423-9. doi: 10.1002/humu.22771

Post edited by LeeTL1220 on


Sign In or Register to comment.