This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Is the adapter boundary set correctly for overlapping read pairs?
I've been working on a GATK walker and during testing I noticed some discrepencies between the base counts from the ReadBackedPileup object I get from AlignmentContext.getBasePileup() and those I get from a previous incarnation of my tool based on htsjdk as well as those I can see when looking at the BAM file using IGV.
I've tracked this down to some overlapping read pairs where the insert size is smaller than the read length. From what I've read in the source code (LocusIteratorByState.dontIncludeReadInPileup and ReadUtils.isBaseInsideAdaptor methods), GATK does not include the non-overlapping portion of each read on because these are likely to be adapter.
However, the adapter boundary on the right of the overlapping portion is set two bases after the alignment end of the read aligning to the negative strand, i.e. one base beyond what I would expect. It means that if I obtain the pileup for a test BAM file containing just one of these overlapping read pairs, I get a depth of 2 for the overlapping portion of the reads and 1 for the base immediately after the overlap, 0 elsewhere.
Is this what was intended?