It looks like you're new here. If you want to get involved, click one of these buttons!
I came across something that confuses me concerning the BaseCounts annotation in the vcf output after running the following commands for UnifiedGenotyper and VariantAnnotator:
java -Xmx6g -jar GenomeAnalysisTK-1.6.596/GenomeAnalysisTKLite.jar -T UnifiedGenotyper -R genome.fa -I my.bam -o my.vcf -glm BOTH
java -Xmx6g -jar GenomeAnalysisTK-1.6.596/GenomeAnalysisTKLite.jar -T VariantAnnotator -R genome.fa -I my.bam -o my_annotated.vcf --variant my.vcf -A DepthPerAlleleBySample -A BaseCounts -A AlleleBalanceBySample -A AlleleBalance -A DepthOfCoverage -A SampleList
One of the positions in the final vcf file looks like this:
chrM 1407 . C A 1246.01 . ABHet=0.639;AC=1;AF=0.500;AN=2;BaseCounts=360,637,2,1;BaseQRankSum=-4.587;DP=1000;DS;Dels=0.00;FS=50.421;HaplotypeScore=53.4969;MLEAC=1;MLEAF=0.500;MQ=35.00;MQ0=0;MQRankSum=-0.127;OND=3.000e-03;QD=4.98;ReadPosRankSum=-6.950;SB=-6.519e-03;Samples=F321 GT:AB:AD:DP:GQ:PL 0/1:0.640:637,360:250:99:1276,0,6082
The BaseCounts annotation is supposed to give the number of times each base was called, so at this position, I would have 360 A's, 637 C's (which is the reference), 2 G's and 1 T.
However, looking at the pileup of the input bam file at this position, the vast majority of bases is reference C, however, I fail to see this substantial number of A's (I see maybe 20), while I definitely see more than one T, e.g.
Maybe there is something I missunderstood but should not the BaseCounts (more or less) reflect what I see in the bam file?
Many thanks in advance for your comments.