SNPs not supported by other evidence
I have 3 sRNA samples. Each was aligned to a virus ref genome (bowtie2) and reads that align were filtered out., and I used unified genotyper on these filtered reads to find variants. Came up with 3 lists, no problems, used a cutoff score of 100. I then decided to check on some of the SNPs using IGV. That is when I noticed that the 3 samples behaved quite differently from one another as far as being able to verify the SNPs. In one sample only 8 out 80 SNPs were certain, of the rest maybe half might possibly be SNPs if the alignments scores were much higher in the lower frequency nt (these were cases where >50% of reads show the template nt, but the SNP nt was higher than 10%). The rest had 85-98% of reads with the original nt.
in the other two samples I could verify just about every SNP and of course some variation between tools is expected. In one of those two samples there also seemed to be a large number of SNPs that were missed.
so: sample one : lots of false positives, sample two: lots of missed SNPs, sample 3 : just about right on target.
sample 1 has a much higher read density than the other two. But other than that I can't think of any differences.
does this mean unified genotyper is not optimal for use with these very short reads (36nt and shorter)? Is there something else that affects this? Would samtools pileup etc be better?