Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Read filters

Read filters are internal filters that can be applied by the GATK engine when using tools that take in read data. This allows us to select reads to be included for analysis based on various criteria. The full list of available read filters is available in the Tool Documentation section of the user guide.

The names of read filters are typically formulated to express what reads they allow through. For example PairedReadFilter selects reads that have the "paired" flag and discards those that do not, and NotDuplicateReadFilter select reads that have the "duplicate" flag, and discards those that do not. Note that in some cases (including the latter example) this logic is the inverse of what the corresponding filter did in older versions of GATK.

Most GATK tools apply one or more read filters by default. You can look up exactly what are the defaults for each tool in their respective Tool Documentation pages. We do not recommend disabling the default read filters used by a given tool, because the filters protect the tools from receiving types of data (e.g. malformed reads) that would make them malfunction. However, it is possible to disable all read filters by using the AllowAllReadsReadFilter filter, which overrides all others.

To apply a read filter, use the following syntax in your command line:

--read-filter <NameOfReadFilter>

Some read filters have an on/off behavior, while others take arguments that modify their behavior or allow you to set threshold values. For example, when using ReadLengthReadFilter to filter reads based on their length, you can specify a maximum length like this:

--read-filter ReadLengthReadFilter --maxReadLength 76

And of course, you can add as many filters as you like by using multiple copies of the --read_filter parameter:

--read-filter ReadLengthReadFilter --maxReadLength 76 --read-filter NotDuplicateReadFilter

These arguments are not positional, so the order in which you put them in the command does not matter. You don't even need to group them together in the command (there can be others that come in between) but in general we do recommend you try to keep them together for readability (do future-you a favor!).

Sign In or Register to comment.