We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

What types of variants can GATK tools detect / handle?

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

The answer depends on what tool we're talking about, and whether we're considering variant discovery or variant manipulation.

Variant manipulation

GATK variant manipulation tools are able to recognize the following types of alleles:

  • SNP (single nucleotide polymorphism)
  • INDEL (insertion/deletion)
  • MIXED (combination of SNPs and indels at a single position)
  • MNP (multi-nucleotide polymorphism, e.g. a dinucleotide substitution)
  • SYMBOLIC (such as the <NON-REF> allele used in GVCFs produced by HaplotypeCaller, the * allele used to signify the presence of a spanning deletion, or undefined events like a very large allele or one that's fuzzy and not fully modeled; i.e. there's some event going on here but we don't know what exactly)

Note that SelectVariants, the GATK tool most used for VCF subsetting operations, discriminates strictly between these categories. This means that if you use for example -selectType INDEL to pull out indels, it will only select pure INDEL records, excluding any MIXED records that might include a SNP allele in addition to the insertion or deletion alleles of interest. To include those you would have to also specify selectType MIXED in the same command.

Variant discovery

The HaplotypeCaller is a sophisticated variant caller that can call different types of variants at the same time. So in addition to SNPs and indels, it is capable of emitting mixed records by default, as well as symbolic representations for e.g. spanning deletions. It does emit physical phasing information, but in its current version, HC is not able to emit MNPs. If you would like to combine contiguous SNPs into MNPs, you will need to use the ReadBackedPhasing tool with the MNP merging function activated. See the tool documentation for details. Our older (and now deprecated) variant caller, UnifiedGenotyper, was even more limited. It only called SNPs and indels, and did so separately (even if you ran in calling mode BOTH, the program performed separate calling operations internally) so it was not able to recognize that SNPs and Indels should be emitted together as a joint record when they occur at the same site.

The general release version of GATK is currently not able to detect SVs (structural variations) or CNVs (copy number variations). However, the alpha version of GATK 4 (the next generation of GATK tools) includes tools for performing CNV (copy number variation) analysis in exome data. Let us know if you're interested in trying them out by commenting on this article in the forum.

There is also a third-party software package called GenomeSTRiP built on top of GATK that provides SV (structural variation) analysis capabilities.

Post edited by Geraldine_VdAuwera on


  • FengTianFengTian CBIMember

    How to call MNP (multi-nucleotide polymorphism, e.g. a dinucleotide substitution) ?
    Is there some spcial setting for it?

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭
  • hr3yhr3y MexicoMember


    I'd be super interesting in testing the alpha methods for CNVs pipeline for exome. We have had to use Sophia in the Lab, but it is somewhat restrictive, e.g. you must upload the fastq as they come from the sequencer, so you can't do you own pre processing or upload data from different sequencer because you cant correct their biases separately beforehand; you cant upload less than 8 samples, etc (I understand that statistical considerations are involved, but still, not very flexible and quite pricey)

    So, anyway, I would be more than interesting to try out the GATK 4 alpha, I love GATK, been using it since discovery :P

    PS. I also love your posts hehe, sometimes they are so funny.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @hr3y Glad you like our tools, and thanks for your kind words :)

    We're in the last stretch of preparing GATK for beta release, I think it will go out early next week. You can already try out the alpha release if you like, or you can wait a little bit longer for the beta; in the case of the exome CNV tools it shouldn't change much. Maybe some command line syntax and argument names will be different (and this will be documented) but the overall workflow will be the same. There's a howto here: http://gatkforums.broadinstitute.org/gatk/discussion/9143/how-to-call-somatic-copy-number-variants-using-gatk4-cnv

    Let us know how it goes!

Sign In or Register to comment.