If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Is the adapter boundary set correctly for overlapping read pairs?
I've been working on a GATK walker and during testing I noticed some discrepencies between the base counts from the ReadBackedPileup object I get from AlignmentContext.getBasePileup() and those I get from a previous incarnation of my tool based on htsjdk as well as those I can see when looking at the BAM file using IGV.
I've tracked this down to some overlapping read pairs where the insert size is smaller than the read length. From what I've read in the source code (LocusIteratorByState.dontIncludeReadInPileup and ReadUtils.isBaseInsideAdaptor methods), GATK does not include the non-overlapping portion of each read on because these are likely to be adapter.
However, the adapter boundary on the right of the overlapping portion is set two bases after the alignment end of the read aligning to the negative strand, i.e. one base beyond what I would expect. It means that if I obtain the pileup for a test BAM file containing just one of these overlapping read pairs, I get a depth of 2 for the overlapping portion of the reads and 1 for the base immediately after the overlap, 0 elsewhere.
Is this what was intended?