inconsistency between the read depth reported by HC and by samtools mpileup

wang1766wang1766 United StatesMember Posts: 2

Hi everyone,

I'm calling SNPs by using GATK HC. I'm also using samtools mpileup to give specific base type and read depth information for each called SNP position (using the same bam file, of course). However, I found that in about 10% of cases, in the positions where GATK called SNPs, samtools mpileup did not report any read coverages. That confused me, so I put the bam file into igv and tablet to visualize the reads. However, both of them confirmed that there were indeed no reads in those positions at all.

Here is the command I used for GATK:

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R index/genome_mu50.fasta -I wt_rg.bam -o wt_raw.vcf

for samtools mpileup:

samtools mpileup -f index/genome_mu50.fasta wt_rg.bam > wt.coverage

Please see the vcf table:

Please notice that a couple of SNPs were called between 1550190 and 1550201, and reasonable allele depths were reported to back up these results.

Please see the igv snip shot of this area:

There is not a single read in this area!

I also tried UnifiedGenotyper, and those SNPs were not called. Is this a bug in HC? or I did something wrong? Any comments will be appreciated. Thanks!



  • wang1766wang1766 United StatesMember Posts: 2

    Sorry the pictures didn't show up well, plz see the attachment.

    1260 x 270 - 150K
    1408 x 726 - 184K
  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,301 admin

    What is happening here is most probably this. Before calling variants within a certain window (called an ActiveRegion), HaplotypeCaller does a de novo assembly step with all the reads, to correct for mapping errors. Sometimes this causes the read coverage to change because some reads are remapped in a different place (though still nearby). So it may be that after reassembly, HC finds reads in positions where there were none in the original bam file.

    Try using the -bamOut argument in this region (with -L to restrict the analysis to that region only); this will produce a bam file containing the assembled haplotypes. This should help clarify what is the basis for the HC's calls.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.