The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# VariantEval/GenotypeConcordance stratify by depth-of-coverage, genotype quality, or other metric?

Cambridge, MAPosts: 3Member

I would like to evaluate variant calls to produce a plot (psuedo-ROC) of sensitivity vs. specificity (or concordance, etc) when I condition on a minimum/maximum value for a particular metric (coverage, genotype quality, etc.). I can do this by running VariantEval or GenotypeConcordance multiple times, once for each cutoff value, but this is inefficient, since I believe I should be able to compute these values in one pass. Alternatively, if there was a simple tool to annotate each variant as concordance or discordant, I could tabulate the results myself. I would like to rely upon GATK's variant comparison logic to compare variants (especially indels). Any thoughts on if current tools can be parameterized, or adapted for these purposes?

N

Tagged:

edited June 2013

Hi Nils,

The simplest way to do what you want is to use CombineVariants with -V:eval eval.vcf -V:comp comp.vcf, then inspect the set= key. It will be the standard one of

1) set=eval
2) set=comp
3) set=Intersection
4) set=eval-filteredIncomp
5) set=filteredIneval-comp
6) set=filteredInAll
7) set=filterIneval
8) set=filterIncomp

If "comp" is a truth set:

FP = (1)+(4)
TP = (3)
FN = (2)+(5)
TN = (6)+(7)+(8)

Does that sound like what you need?

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

• Cambridge, MAPosts: 3Member

Close, but I would like the FP/TP/FN/TN when I vary by minimum coverage (say the "DP" field in the VCF). So I would have a table for each of those values conditioned on the minimum coverage. I could then plot a ROC curve: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

I could run CombineVariants multiple times (or VariantEval or GenotypeConcordance), with each time using a new VCF filtered by my own criteria (say depth of coverage), but this is hugely inefficient, and so instead I would like to run CombineVariants (or more likely the others) and get a table of values. The key is to know how sensitivity relates an arbitrary metric (say coverage).

My second related question was is there a function in the GATK API that compares a called variant against a true variant (say via VariantContexts)? I could then write my own walker (I think) over two variant files.