This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
(How to) Filter on genotype using VariantFiltration
Before using VariantFiltration, please read the entirety of the discussion in https://github.com/broadinstitute/gatk/issues/5362 that describes VariantFiltration's unintuitive behavior when processing compound expressions.
This tutorial illustrates how to filter on genotype, e.g. heterozygous genotype call. The steps apply to either single-sample or multi-sample callsets.
First, the genotype is annotated with a filter expression using VariantFiltration. Then, the filtered genotypes are made into no-call (
./.) genotypes with SelectVariants so that downstream tools may discount them.
We use example variant record FORMAT fields from
trio.vcf to illustrate.
GT:AD:DP:GQ:PL 0/1:17,15:32:99:399,0,439 0/1:11,12:23:99:291,0,292 1/1:0,30:30:90:948,90,0
1. Annotate genotypes using VariantFiltration
If we want to filter heterozygous genotypes, we use VariantFiltration's
--genotype-filter-expression "isHet == 1" option. We can specify the annotation value for the tool to label the heterozygous genotypes with with the
--genotype-filter-name option. Here, this parameter's value is set to
gatk VariantFiltration \ -V trio.vcf \ -O trio_VF.vcf \ --genotype-filter-expression "isHet == 1" \ --genotype-filter-name "isHetFilter"
After filtering, in the resulting
trio_VF.vcf, our example record adds an
FT field and becomes:
GT:AD:DP:FT:GQ:PL 0/1:17,15:32:isHetFilter:99:399,0,439 0/1:11,12:23:isHetFilter:99:291,0,292 1/1:0,30:30:PASS:90:948,90,0
We see that HET (
0/1) genotype calls get a
isHetFilter in the
FT field and other genotype calls get a
PASS in the genotype field.
The VariantFiltration tool document lists the various options to filter on the FORMAT (aka genotype call) field:
We have put in convenience methods so that one can now filter out hets ("isHet == 1"), refs ("isHomRef == 1"), or homs ("isHomVar == 1"). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object.
2. Transform filtered genotypes to no call
Running SelectVariants with
--set-filtered-gt-to-nocall will further transform the flagged genotypes with a null genotype call. This conversion is necessary because downstream tools do not parse the FORMAT-level filter field.
gatk SelectVariants \ -V trio_VF.vcf \ --set-filtered-gt-to-nocall \ -O trioGGVCF_VF_SV.vcf
The result is that the
GT genotypes of the
isHetFiltered genotype records become null or no call (
./.) as follows.
GT:AD:DP:FT:GQ:PL ./.:17,15:32:isHetFilter:99:399,0,439 ./.:11,12:23:isHetFilter:99:291,0,292 1/1:0,30:30:PASS:90:948,90,0