The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# How can I invoke read filters and their arguments?

edited December 2015 in FAQs

Most GATK tools apply several read filters by default. You can look up exactly what are the defaults for each tool in their respective Technical Documentation pages.

But sometimes you want to specify additional filters yourself (and before you ask, no, you cannot disable the default read filters used by a given tool). This is how you do it:

The --read-filter argument (or -rf for short) allows you to apply whatever read filters you'd like. For example, to add the MaxReadLengthFilter filter above to PrintReads, you just add this to your command line:

--read_filter MaxReadLength


#### Notice that when you specify a read filter, you need to strip the Filter part of its name off!

The read filter will be applied with its default value (which you can also look up in the Tech Docs for that filter). Now, if you want to specify a different value from the default, you pass the relevant argument by adding this right after the read filter:

--read_filter MaxReadLength -maxReadLength 76


It's important that you pass the argument right after the filter itself, otherwise the command line parser won't know that they're supposed to go together.

And of course, you can add as many filters as you like by using multiple copies of the --read_filter parameter:

--read_filter MaxReadLength --maxReadLength 76 --read_filter ZeroMappingQualityRead

Post edited by Geraldine_VdAuwera on
Tagged:

• GreensboroMember
edited September 2016

I want to select the aligned read that have mapQ below 40. I have this script
java -jar GenomeAnalysisTK.jar \
-R reference.fasta \
-I input1.bam \
-o output.bam \

How can I inverse the --read_filter flag?

Thanks,

• GreensboroMember
@Sheila @Geraldine_VdAuwera

I want to select the aligned read that have mapQ below 40. I have this script
java -jar GenomeAnalysisTK.jar \
-R reference.fasta \
-I input1.bam \
-o output.bam \

How can I inverse the --read_filter flag/value?

Thanks,

The -mmq option retains higher mapping quality reads from the value you provide. I'm not aware of any way to directly inverse this feature within GATK tools. Can I ask you why you want to select the lower mapping quality reads? I imagine you can do this using an awk command.

Soo Hee is right. There is no simple way to do it. However, you can run PrintReads twice, once with -mmq 40 and again with -mmq 0. Then you can use DiffObjects to get a list of the reads that are different between the two. Please note I have never used that particular tool so I cannot comment much about it. You can also use AWK or some other tool like Soo Hee suggested.

-Sheila

• GreensboroMember
edited September 2016

@Sheila @shlee
Hi I want to retain the reads below mapQ40 for downstream purposes - mainly to compare low mapQ reads from different samples, blast search and/or de novo assembly (if need be).

I realized that awk would do it and already have a script, but its just taking too much time and resources for so many samples I have.
You need to convert the file to sam - remove header info - filter based on 5th column (mapQ flag) - reattach header - convert to bam. For just 1 sample it took about a day. Lol. Samtools and Picard both don't it, haven't found other programs too. Probably GATK can add such functionality

#### Issue · Github September 2016 by Sheila

Issue Number
1476
State
closed
Last Updated
Closed By
chandrans

@everestial007, you should be able to stream in part some of the processes you list by taking advantage of Samtools functionality as shown in the following set of commands. I came up with these, so you should double check the output is as expected. It's my impression that this kind of usage of Samtools is fairly standard.

set -o pipefail
samtools view file.bam | awk '{if ($5 < 40) print$0;}' > lessthanforty.sam
samtools view -H file.bam > headeronly.txt
`

If you need to process many BAMs in the same manner, and are interested in scripting this, then you should check out our WDL documentation. You could write a WDL script that takes an array of BAM files as input and parallelizes processing of these files over these multiple commands. I imagine even as a newbie to WDL scripts, it will take you ~ a day, perhaps two, to learn the convention, write up a script and get it running over your samples.

It actually would be fairly trivial to add this functionality. I can see how that would be useful. @Sheila, care to put in a feature request? This would be a good training task for a new dev (I think we got a new person recently, possibly today?).

#### Issue · Github September 2016 by Sheila

Issue Number
1275
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans
• GreensboroMember

@shlee
Actually, I had almost the same command lines you provided.
I will check the WDL documentation, as it might become helpful.

Thanks !

• GreensboroMember

@Geraldine_VdAuwera said:
It actually would be fairly trivial to add this functionality. I can see how that would be useful. @Sheila, care to put in a feature request? This would be a good training task for a new dev (I think we got a new person recently, possibly today?).

• I was wondering if this functionality was added. Thanks !

Not yet. I put in a feature request for GATK4.

-Sheila