The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

How can I invoke read filters and their arguments?

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,469Administrator, Dev admin
edited December 2015 in FAQs

Most GATK tools apply several read filters by default. You can look up exactly what are the defaults for each tool in their respective Technical Documentation pages.

But sometimes you want to specify additional filters yourself (and before you ask, no, you cannot disable the default read filters used by a given tool). This is how you do it:

The --read-filter argument (or -rf for short) allows you to apply whatever read filters you'd like. For example, to add the MaxReadLengthFilter filter above to PrintReads, you just add this to your command line:

--read_filter MaxReadLength 

Notice that when you specify a read filter, you need to strip the Filter part of its name off!

The read filter will be applied with its default value (which you can also look up in the Tech Docs for that filter). Now, if you want to specify a different value from the default, you pass the relevant argument by adding this right after the read filter:

--read_filter MaxReadLength -maxReadLength 76

It's important that you pass the argument right after the filter itself, otherwise the command line parser won't know that they're supposed to go together.

And of course, you can add as many filters as you like by using multiple copies of the --read_filter parameter:

--read_filter MaxReadLength --maxReadLength 76 --read_filter ZeroMappingQualityRead
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Tagged:

Comments

  • everestial007everestial007 GreensboroPosts: 62Member
    edited September 12

    I want to select the aligned read that have mapQ below 40. I have this script
    java -jar GenomeAnalysisTK.jar \
    -T PrintReads \
    -R reference.fasta \
    -I input1.bam \
    -o output.bam \
    --read_filter MappingQuality -mmq 40

    How can I inverse the --read_filter flag?

    Thanks,

  • everestial007everestial007 GreensboroPosts: 62Member
    @Sheila @Geraldine_VdAuwera

    I want to select the aligned read that have mapQ below 40. I have this script
    java -jar GenomeAnalysisTK.jar \
    -T PrintReads \
    -R reference.fasta \
    -I input1.bam \
    -o output.bam \
    --read_filter MappingQuality -mmq 40

    How can I inverse the --read_filter flag/value?

    Thanks,
  • shleeshlee CambridgePosts: 206Member, Broadie, Moderator, Dev admin

    Hi @everestial007 ,

    The -mmq option retains higher mapping quality reads from the value you provide. I'm not aware of any way to directly inverse this feature within GATK tools. Can I ask you why you want to select the lower mapping quality reads? I imagine you can do this using an awk command.

  • SheilaSheila Broad InstitutePosts: 3,953Member, Broadie, Moderator, Dev admin

    @everestial007
    Hi,

    Soo Hee is right. There is no simple way to do it. However, you can run PrintReads twice, once with -mmq 40 and again with -mmq 0. Then you can use DiffObjects to get a list of the reads that are different between the two. Please note I have never used that particular tool so I cannot comment much about it. You can also use AWK or some other tool like Soo Hee suggested.

    -Sheila

  • everestial007everestial007 GreensboroPosts: 62Member
    edited September 13

    @Sheila @shlee
    Hi I want to retain the reads below mapQ40 for downstream purposes - mainly to compare low mapQ reads from different samples, blast search and/or de novo assembly (if need be).

    I realized that awk would do it and already have a script, but its just taking too much time and resources for so many samples I have.
    You need to convert the file to sam - remove header info - filter based on 5th column (mapQ flag) - reattach header - convert to bam. For just 1 sample it took about a day. Lol. Samtools and Picard both don't it, haven't found other programs too. Probably GATK can add such functionality :smiley:

    Issue · Github
    by Sheila

    Issue Number
    1476
    State
    open
    Last Updated
  • shleeshlee CambridgePosts: 206Member, Broadie, Moderator, Dev admin

    @everestial007, you should be able to stream in part some of the processes you list by taking advantage of Samtools functionality as shown in the following set of commands. I came up with these, so you should double check the output is as expected. It's my impression that this kind of usage of Samtools is fairly standard.

    set -o pipefail
    samtools view file.bam | awk '{if ($5 < 40) print $0;}' > lessthanforty.sam
    samtools view -H file.bam > headeronly.txt
    cat headeronly.txt lessthanforty.sam > headerandlessthanforty.sam
    samtools view -b headerandlessthanforty.sam > headerandlessthanforty.bam
    

    If you need to process many BAMs in the same manner, and are interested in scripting this, then you should check out our WDL documentation. You could write a WDL script that takes an array of BAM files as input and parallelizes processing of these files over these multiple commands. I imagine even as a newbie to WDL scripts, it will take you ~ a day, perhaps two, to learn the convention, write up a script and get it running over your samples.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,469Administrator, Dev admin

    It actually would be fairly trivial to add this functionality. I can see how that would be useful. @Sheila, care to put in a feature request? This would be a good training task for a new dev (I think we got a new person recently, possibly today?).

    Geraldine Van der Auwera, PhD

    Issue · Github
    by Sheila

    Issue Number
    1275
    State
    open
    Last Updated
    Assignee
    Array
    Milestone
    Array
  • everestial007everestial007 GreensboroPosts: 62Member

    @shlee
    Actually, I had almost the same command lines you provided.
    I will check the WDL documentation, as it might become helpful.

    Thanks !

Sign In or Register to comment.