Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

AD in VCF doesn't match BAM

kippkipp New York CityMember


I'm using GATK to call variants in my RNA-Seq data. I'm noticing something strange, perhaps someone can help? For a number of sites the VCF is reporting things I cannot replicate from BAMs. How can I recover the reads that contribute to a variant call? Here is an example for 1 site in 1 sample, but I've observed this at many sites/samples:

$ grep 235068463 file.vcf 
chr1    235068463   .   T   C   1795.77 .   AC=1;AF=0.500;AN=2;BaseQRankSum=-3.530;ClippingRankSum=-0.535;DP=60;FS=7.844;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.401;QD=29.93;ReadPosRankSum=3.557   GT:AD:DP:GQ:PL  0/1:5,55:60:44:1824,0,44

60 reads, 5 T, 55 C.
But loading the bam in IGV, I do not see any T reads. Similarly:

$ samtools view -uh chr1:235068463-235068463 |samtools mpileup - |grep 235068463
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
chr1    235068463   N   60  cCCccccCCCcccccCcccccccccCCCccCCCCCcCcccccCCCcCcCCccCCCCccCC    >[email protected]@>A>[email protected]:@@[email protected]?>A?CBBAAAABA

There are just 60 C's at that location. How do I decide what the genotype here is? C/C or C/T ?

For methodology I'm using gatk/3.2.0. I tried using HC from gatk/3.3.1 and got the same result. The bam and vcf files come from the final two lines:
-2 pass STAR
-Mark Dups
-Mark Dups



Sign In or Register to comment.