Selection of variant which are shared by all the reads but different from the reference?

Hi GATK team,

I have some trouble to understand these parameters and I think one of them could solve my problem:

Some background informations, I have a reference genome of one microbial specie. I have aligned all raw reads coming from a metagenomic dataset to this reference genome. At the end I had a bam file with a good coverage of 94% of the reference genome. After that I have proceed to the creation of the VCF file (after realigning the BAM file using GATK tools). I have run the following command line:
java -jar apps/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -R genomereference.fasta -T HaplotypeCaller -I file_realigned.bam -o output_HaplotypeCaller_BAMrealigned

I have evaluated the impact of these variants using SNPeff. And I am interested in variant affecting with HIGH effect the final product.

I would like to be able to find variants which are shared by all the reads (at least most of them) but different from the reference. As far as now, when I am evaluating the effect of the variations, it usually concerns only one reads among all of them and it is not relevant for my purpose. I want to highlight difference between reference and reads but not betweens the reads them selves.

So I am thinking about sevaral way to do that. One will be to filter using the quality score. Could you confirm me that the more this number is high means that the more confident we are in the variant? Confident in what way?

Another way could be to filter using allele frequencies or one of these parameters AC, AF, DP, FS etc... I am not sure of what these parameters correspond to. It is really not clear for me and I would need your help to select the results I am interested in.

Let me know if it is unclear and if you need illustrations to understand my problem.

Thanks a lot for your support,



