determining read depth from bam file: 4 methods, 3 different results

Redmar_van_den_BergRedmar_van_den_Berg Member ✭✭
edited January 2016 in Ask the GATK team

I was looking at the coverage of my bam files, and I noticed that different tools gave different results for the same file. Has anyone encountered this problem before? Am I using the tools wrong? I'm not sure how to proceed now, using gatk/pysam based on majority vote seems silly since all tools should show the same results.

I used the following methods:

gatk

gatk -T DepthOfCoverage -R $ref -I $bamfile | head -n 11

bedtools

genomeCoverageBed -d -ibam $bam | head -n 10

pysam

import pysam
def bedtools(filename): 
    """simulate the behaviour of bedtools"""
    bamfile=pysam.AlignmentFile(filename,'rb')

    for ref in bamfile.header['SQ']:
        name=ref['SN']

        pileup=bamfile.pileup()
        for pos,column in enumerate(pileup,1):
            depth=column.nsegments
            print(name,pos,depth)
            if pos >= 10:
                break

IGV

Manually, using mouse-over of the depth graph in the default view to see the exact read depth on the tooltip

Results

(I couldn't get it aligned better here, paste it into excel for a proper view)
Position IGV pysam genomeCoverageBed gatk 1 127 89 128 89 2 130 92 131 92 3 130 92 131 92 4 133 95 134 95 5 136 98 137 98 6 137 99 138 99 7 140 102 141 102 8 141 103 142 103 9 142 104 143 104 10 146 108 147 108

Summary of results

  • Only pysam and gatk agree
  • IGV seems to count 38 reads more then pysam/gatk
  • genomeCoverageBed counts 37 more then pysam/gatk

Best Answer

Answers

Sign In or Register to comment.