AD in VCF doesn't match BAM

kippkipp New York CityMember


I'm using GATK to call variants in my RNA-Seq data. I'm noticing something strange, perhaps someone can help? For a number of sites the VCF is reporting things I cannot replicate from BAMs. How can I recover the reads that contribute to a variant call? Here is an example for 1 site in 1 sample, but I've observed this at many sites/samples:

$ grep 235068463 file.vcf 
chr1    235068463   .   T   C   1795.77 .   AC=1;AF=0.500;AN=2;BaseQRankSum=-3.530;ClippingRankSum=-0.535;DP=60;FS=7.844;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.401;QD=29.93;ReadPosRankSum=3.557   GT:AD:DP:GQ:PL  0/1:5,55:60:44:1824,0,44

60 reads, 5 T, 55 C.
But loading the bam in IGV, I do not see any T reads. Similarly:

$ samtools view -uh chr1:235068463-235068463 |samtools mpileup - |grep 235068463
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
chr1    235068463   N   60  cCCccccCCCcccccCcccccccccCCCccCCCCCcCcccccCCCcCcCCccCCCCccCC    >CA@B@>A>BA@BCABACCC:@@ACABBBCAACBBCABCB@CABBAB?>A?CBBAAAABA

There are just 60 C's at that location. How do I decide what the genotype here is? C/C or C/T ?

For methodology I'm using gatk/3.2.0. I tried using HC from gatk/3.3.1 and got the same result. The bam and vcf files come from the final two lines:
-2 pass STAR
-Mark Dups
-Mark Dups



Sign In or Register to comment.