Reference genotype quality in presence of conflicting reads
Given the following line from a VCF file:
Supercontig_1.1 308 . T A . PASS AC=0;AF=0;AN=2;BaseQRankSum=-1.932;ClippingRankSum=0;DP=99;ExcessHet=3.01;MQ=43.26;MQRankSum=-2.382;ReadPosRankSum=1.77;VariantType=SNP GT:AD:DP:RGQ 0/0:52,4:56:1
I note that despite there being 52 reads passing filters for the reference genotype, the reference genotype quality is still only 1. Is RGQ affected by the presence of reads indicating a possible variant (4 in this case)? So the low RGQ score in this case reflects uncertainty over whether this position really is reference call (T/T), or if it might be a variant (A/A or A/T or T/A).
If I was being super strict about only including highly certain positions in my analysis would you recommend that I assign this position a missing genotype because I can't really be sure what it is?