If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
(How to) Filter on genotype using VariantFiltration
Before using VariantFiltration, please read the entirety of the discussion in https://github.com/broadinstitute/gatk/issues/5362 that describes VariantFiltration's unintuitive behavior when processing compound expressions.
This tutorial illustrates how to filter on genotype, e.g. heterozygous genotype call. The steps apply to either single-sample or multi-sample callsets.
First, the genotype is annotated with a filter expression using VariantFiltration. Then, the filtered genotypes are made into no-call (
./.) genotypes with SelectVariants so that downstream tools may discount them.
We use example variant record FORMAT fields from
trio.vcf to illustrate.
GT:AD:DP:GQ:PL 0/1:17,15:32:99:399,0,439 0/1:11,12:23:99:291,0,292 1/1:0,30:30:90:948,90,0
1. Annotate genotypes using VariantFiltration
If we want to filter heterozygous genotypes, we use VariantFiltration's
--genotype-filter-expression "isHet == 1" option. We can specify the annotation value for the tool to label the heterozygous genotypes with with the
--genotype-filter-name option. Here, this parameter's value is set to
gatk VariantFiltration \ -V trio.vcf \ -O trio_VF.vcf \ --genotype-filter-expression "isHet == 1" \ --genotype-filter-name "isHetFilter"
After filtering, in the resulting
trio_VF.vcf, our example record adds an
FT field and becomes:
GT:AD:DP:FT:GQ:PL 0/1:17,15:32:isHetFilter:99:399,0,439 0/1:11,12:23:isHetFilter:99:291,0,292 1/1:0,30:30:PASS:90:948,90,0
We see that HET (
0/1) genotype calls get a
isHetFilter in the
FT field and other genotype calls get a
PASS in the genotype field.
The VariantFiltration tool document lists the various options to filter on the FORMAT (aka genotype call) field:
We have put in convenience methods so that one can now filter out hets ("isHet == 1"), refs ("isHomRef == 1"), or homs ("isHomVar == 1"). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object.
2. Transform filtered genotypes to no call
Running SelectVariants with
--set-filtered-gt-to-nocall will further transform the flagged genotypes with a null genotype call. This conversion is necessary because downstream tools do not parse the FORMAT-level filter field.
gatk SelectVariants \ -V trio_VF.vcf \ --set-filtered-gt-to-nocall \ -O trioGGVCF_VF_SV.vcf
The result is that the
GT genotypes of the
isHetFiltered genotype records become null or no call (
./.) as follows.
GT:AD:DP:FT:GQ:PL ./.:17,15:32:isHetFilter:99:399,0,439 ./.:11,12:23:isHetFilter:99:291,0,292 1/1:0,30:30:PASS:90:948,90,0