Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

What types of variants can GATK tools handle?

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,815Administrator, GATK Developer admin

The answer depends on what tool we're talking about, and whether we're considering variant discovery or variant manipulation.

GATK variant manipulation tools are able to recognize the following types of alleles:

  • SNP (single nucleotide polymorphism)
  • INDEL (insertion/deletion)
  • MIXED (combination of SNPs and indels at a single position)
  • MNP (multi-nucleotide polymorphism, e.g. a dinucleotide substitution)
  • SYMBOLIC (generally, a very large allele or one that's fuzzy and not fully modeled; i.e. there's some event going on here but we don't know what exactly)

Of our two variant callers, UnifiedGenotyper is the more limited, as it only calls SNPs and indels, and does so separately (even if you run in calling mode BOTH, the program performs separate calling operations internally). The HaplotypeCaller is more sophisticated and calls different types of variants at the same time. So in addition to SNPs and indels, it is capable of emitting mixed records by default. It is also capable of emitting MNPs and symbolic alleles, but the modes to do so are not enabled by default and they are not part of our recommended best practices for the tool.

The GATK currently does not handle SVs (structural variations) or CNVs (copy number variations), but there are some third-party software packages built on top of GATK that provide this functionality. See GenomeSTRiP for SVs and XHMM for CNVs.

Geraldine Van der Auwera, PhD

Sign In or Register to comment.