Meaning of terms in output of GenotypeConcordance

Member
edited October 2013

Just to make sure my understanding is correct:

HET: heterozygous
HOM_REF: homozygous reference
HOM_VAR: homozygous variant
MIXED: something like ./1`
Mismatching_Alleles: ??
UNAVAILABLE: for internal use
ALLELES_MATCH: ??
ALLELES_DO_NOT_MATCH: ??
EVAL_ONLY: ??
TRUTH_ONLY: does it actually mean the variants present in comp but not in eval, like COMP_ONLY?

how does the following computed?

Non-Reference_Discrepancy
Non-Reference_Sensitivity
Overall_Genotype_Concordance

Thanks a lot!

• Member

@blueskypy, I've asked the tool's author to answer you but he is very busy so you'll need to be a little patient.

• 7ccMember

Hi blueskypy,

Thanks for your patience. Your intuition on all counts has been correct.

ALLELES_MATCH are counts of calls at the same site where the alleles match

ALLELES_DO_NOT_MATCH are counts of calls at the same location with different alleles, such as the eval set calling a 'G' alternate allele, and the comp set calling a 'T' alternate allele.

Eval only are the counts of sites present only in the eval VCF, and not in the comp.

Non-reference sensitivity is the sensitivity of the eval calls to polymorphic calls in the comp set, that is (# true positive)/(# true polymorphic).

Overall genotype concordance is just (# concordant genotypes)/(# genotypes)

This tends to be high just because reference calls predominate; so we use in addition the Non-reference discrepancy, which, loosely, is the genotype concordance excluding concordant reference sites. See attached.

