The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# VariantEval/GenotypeConcordance stratify by depth-of-coverage, genotype quality, or other metric?

Boston, MAMember

I would like to evaluate variant calls to produce a plot (psuedo-ROC) of sensitivity vs. specificity (or concordance, etc) when I condition on a minimum/maximum value for a particular metric (coverage, genotype quality, etc.). I can do this by running VariantEval or GenotypeConcordance multiple times, once for each cutoff value, but this is inefficient, since I believe I should be able to compute these values in one pass. Alternatively, if there was a simple tool to annotate each variant as concordance or discordant, I could tabulate the results myself. I would like to rely upon GATK's variant comparison logic to compare variants (especially indels). Any thoughts on if current tools can be parameterized, or adapted for these purposes?

N

Tagged:

edited June 2013

Hi Nils,

The simplest way to do what you want is to use CombineVariants with -V:eval eval.vcf -V:comp comp.vcf, then inspect the set=` key. It will be the standard one of

1) set=eval
2) set=comp
3) set=Intersection
4) set=eval-filteredIncomp
5) set=filteredIneval-comp
6) set=filteredInAll
7) set=filterIneval
8) set=filterIncomp

If "comp" is a truth set:

FP = (1)+(4)
TP = (3)
FN = (2)+(5)
TN = (6)+(7)+(8)

Does that sound like what you need?

• Boston, MAMember

Close, but I would like the FP/TP/FN/TN when I vary by minimum coverage (say the "DP" field in the VCF). So I would have a table for each of those values conditioned on the minimum coverage. I could then plot a ROC curve: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

I could run CombineVariants multiple times (or VariantEval or GenotypeConcordance), with each time using a new VCF filtered by my own criteria (say depth of coverage), but this is hugely inefficient, and so instead I would like to run CombineVariants (or more likely the others) and get a table of values. The key is to know how sensitivity relates an arbitrary metric (say coverage).

My second related question was is there a function in the GATK API that compares a called variant against a true variant (say via VariantContexts)? I could then write my own walker (I think) over two variant files.