The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Using VariantEval

delangeldelangel Broad InstituteMember Posts: 71
edited October 2015 in Archive

This article has been deprecated in favor of a new method article which you can view here.

For a complete, detailed argument reference, refer to the technical documentation page.


You can find detailed information about the various modules here.

Stratification modules

  • AlleleFrequency
  • AlleleCount
  • CompRod
  • Contig
  • CpG
  • Degeneracy
  • EvalRod
  • Filter
  • FunctionalClass
  • JexlExpression
  • Novelty
  • Sample

Evaluation modules

  • CompOverlap
  • CountVariants

Note that the GenotypeConcordance module has been rewritten as a separate walker tool (see its Technical Documentation page).

A useful analysis using VariantEval


We in GSA often find ourselves performing an analysis of 2 different call sets. For SNPs, we often show the overlap of the sets (their "venn") and the relative dbSNP rates and/or transition-transversion ratios. The picture provided is an example of such a slide and is easy to create using VariantEval. Assuming you have 2 filtered VCF callsets named 'foo.vcf' and 'bar.vcf', there are 2 quick steps.

Combine the VCFs

java -jar GenomeAnalysisTK.jar \
    -R ref.fasta \
    -T CombineVariants \
    -V:FOO foo.vcf \
    -V:BAR bar.vcf \
    -priority FOO,BAR \
    -o merged.vcf

Run VariantEval

java -jar GenomeAnalysisTK.jar \
     -T VariantEval \
     -R ref.fasta \
     -D dbsnp.vcf \
     -select 'set=="Intersection"' -selectName Intersection \
     -select 'set=="FOO"' -selectName FOO \
     -select 'set=="FOO-filterInBAR"' -selectName InFOO-FilteredInBAR \
     -select 'set=="BAR"' -selectName BAR \
     -select 'set=="filterInFOO-BAR"' -selectName InBAR-FilteredInFOO \
     -select 'set=="FilteredInAll"' -selectName FilteredInAll \
     -o merged.eval.gatkreport \
     -eval merged.vcf \
     -l INFO

Checking the possible values of 'set'

It is wise to check the actual values for the set names present in your file before writing complex VariantEval commands. An easy way to do this is to extract the value of the set fields and then reduce that to the unique entries, like so:

java -jar GenomeAnalysisTK.jar -T VariantsToTable -R ref.fasta -V merged.vcf -F set -o fields.txt
grep -v 'set' fields.txt | sort | uniq -c

This will provide you with a list of all of the possible values for 'set' in your VCF so that you can be sure to supply the correct select statements to VariantEval.

Reading the VariantEval output file

The VariantEval output is formatted as a GATKReport.

Understanding Genotype Concordance values from Variant Eval

The VariantEval genotype concordance module emits information the relationship between the eval calls and genotypes and the comp calls and genotypes. The following three slides provide some insight into three key metrics to assess call sensitivity and concordance between genotypes.

##:GATKReport.v0.1 GenotypeConcordance.sampleSummaryStats : the concordance statistics summary for each sample
GenotypeConcordance.sampleSummaryStats  CompRod   CpG      EvalRod  JexlExpression  Novelty  percent_comp_ref_called_var  percent_comp_het_called_het  percent_comp_het_called_var  percent_comp_hom_called_hom  percent_comp_hom_called_var  percent_non-reference_sensitivity  percent_overall_genotype_concordance  percent_non-reference_discrepancy_rate
GenotypeConcordance.sampleSummaryStats  compOMNI  all      eval     none            all      0.78                         97.65                        98.39                        99.13                        99.44                        98.80                              99.09                                 3.60

The key outputs:

  • percent_overall_genotype_concordance
  • percent_non_ref_sensitivity_rate
  • percent_non_ref_discrepancy_rate

All defined below.





720 x 540 - 87K
2999 x 2249 - 413K
2999 x 2249 - 485K
2999 x 2249 - 381K
2999 x 2249 - 441K
Post edited by Geraldine_VdAuwera on

Issue · Github
by Geraldine_VdAuwera

Issue Number
Last Updated
Closed By
This discussion has been closed.