Higher concordance between genotypes representing reference homozygotes than other

I sequenced two times 66 newt individuals in few hundred loci to a moderate coverage (45 on average). Now I want to check genotype concordance in various depth classes to answer the question which coverage is enough to call genotypes properly. I have found that using all genotypes in variant sites yields genotype concordance of 0.994 with coverage 8 whereas when excluding genotypes in which both individuals are Hom_REF and thus calculating 1-Non Reference Discrepancy the result in the same coverage class is 0.966. The difference holds also for higher coverage classes.
So it seems that there is higher concordance between genotypes representing reference homozygotes.
Why is it so?
BTW: I’m using GATK Unified Genotyper with standard settings but with mbq set to 20 and pcr_error_rate to 1.0E-3 and further filters GQ < 20.0, MQRankSum < -12.5, QD < 2.0


Best Answer


  • Hi Valentin,
    Thank you for fast answer! Regarding this prior, since there is PL not GP in the genotype fields I thought that there no prior taken when calling genotypes. I wonder if this prior might also influence the probability of calling HET when calling SNP’s in hybrid zone where I have a lot of HOM_REF and HOM VAR?
    Yes, the effect is less pronounced in the whole data set (NRD=0.012). I didn’t check the concordance in HET and HOM_VAR separately but I’m also curious and I will check it soon and let you know.

  • Hi Valentin,
    I checked the concordance in my data set starting from DP=8, for biallelic positions only. So the concordance is HET=0.985, HOM_VAR=0.996 and HOM_REF=0.999. So it seems that heterozygotes are hardest to call concordantly and still some prior effect is visible. Nevertheless the concordance seem to be pretty high.

