The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Did we ask for a bug report?

Then follow instructions in Article#1894.

#### ☞ Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# VariantEval/GenotypeConcordance stratify by depth-of-coverage, genotype quality, or other metric?

Boston, MAMember Posts: 13

I would like to evaluate variant calls to produce a plot (psuedo-ROC) of sensitivity vs. specificity (or concordance, etc) when I condition on a minimum/maximum value for a particular metric (coverage, genotype quality, etc.). I can do this by running VariantEval or GenotypeConcordance multiple times, once for each cutoff value, but this is inefficient, since I believe I should be able to compute these values in one pass. Alternatively, if there was a simple tool to annotate each variant as concordance or discordant, I could tabulate the results myself. I would like to rely upon GATK's variant comparison logic to compare variants (especially indels). Any thoughts on if current tools can be parameterized, or adapted for these purposes?

Thanks for your help in advance,

N

Tagged:

## Answers

• Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin
edited June 2013

Hi Nils,

The simplest way to do what you want is to use CombineVariants with -V:eval eval.vcf -V:comp comp.vcf, then inspect the set=` key. It will be the standard one of

1) set=eval
2) set=comp
3) set=Intersection
4) set=eval-filteredIncomp
5) set=filteredIneval-comp
6) set=filteredInAll
7) set=filterIneval
8) set=filterIncomp

If "comp" is a truth set:

FP = (1)+(4)
TP = (3)
FN = (2)+(5)
TN = (6)+(7)+(8)

Does that sound like what you need?

Geraldine Van der Auwera, PhD

• Boston, MAMember Posts: 13

Close, but I would like the FP/TP/FN/TN when I vary by minimum coverage (say the "DP" field in the VCF). So I would have a table for each of those values conditioned on the minimum coverage. I could then plot a ROC curve: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

I could run CombineVariants multiple times (or VariantEval or GenotypeConcordance), with each time using a new VCF filtered by my own criteria (say depth of coverage), but this is hugely inefficient, and so instead I would like to run CombineVariants (or more likely the others) and get a table of values. The key is to know how sensitivity relates an arbitrary metric (say coverage).

My second related question was is there a function in the GATK API that compares a called variant against a true variant (say via VariantContexts)? I could then write my own walker (I think) over two variant files.

• Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin

Hmm, I see -- but no, there isn't anything built in to do that. I think the approach you hint at with your second question would work well though.

I would recommend looking at the CombineVariants walker code to get a better idea of the existing functions that you could use to compare calls.

Geraldine Van der Auwera, PhD

Sign In or Register to comment.