The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

# VariantEval/GenotypeConcordance stratify by depth-of-coverage, genotype quality, or other metric?

Boston, MAMember, Dev Posts: 13

I would like to evaluate variant calls to produce a plot (psuedo-ROC) of sensitivity vs. specificity (or concordance, etc) when I condition on a minimum/maximum value for a particular metric (coverage, genotype quality, etc.). I can do this by running VariantEval or GenotypeConcordance multiple times, once for each cutoff value, but this is inefficient, since I believe I should be able to compute these values in one pass. Alternatively, if there was a simple tool to annotate each variant as concordance or discordant, I could tabulate the results myself. I would like to rely upon GATK's variant comparison logic to compare variants (especially indels). Any thoughts on if current tools can be parameterized, or adapted for these purposes?

N

Tagged:

edited June 2013

Hi Nils,

The simplest way to do what you want is to use CombineVariants with -V:eval eval.vcf -V:comp comp.vcf, then inspect the set= key. It will be the standard one of

1) set=eval
2) set=comp
3) set=Intersection
4) set=eval-filteredIncomp
5) set=filteredIneval-comp
6) set=filteredInAll
7) set=filterIneval
8) set=filterIncomp

If "comp" is a truth set:

FP = (1)+(4)
TP = (3)
FN = (2)+(5)
TN = (6)+(7)+(8)

Does that sound like what you need?

Geraldine Van der Auwera, PhD

• Boston, MAMember, Dev Posts: 13

Close, but I would like the FP/TP/FN/TN when I vary by minimum coverage (say the "DP" field in the VCF). So I would have a table for each of those values conditioned on the minimum coverage. I could then plot a ROC curve: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

I could run CombineVariants multiple times (or VariantEval or GenotypeConcordance), with each time using a new VCF filtered by my own criteria (say depth of coverage), but this is hugely inefficient, and so instead I would like to run CombineVariants (or more likely the others) and get a table of values. The key is to know how sensitivity relates an arbitrary metric (say coverage).

My second related question was is there a function in the GATK API that compares a called variant against a true variant (say via VariantContexts)? I could then write my own walker (I think) over two variant files.