Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

GATK ClipReads option

JCGrenierJCGrenier Montreal, QCPosts: 13Member

Hi folks,

I have a question concerning the ClipReads function and the different functionalities it's offering. The --clipRepresentation option and the different possibilites it offers doesn't seems to do what it's supposed to... As a matter of fact, I tried the REVERT_SOFTCLIPPED_BASES
and HARDCLIP_BASES options and those seem unefficient while giving in input bam files coming from bwa.

HARDCLIP_BASES seems to do what REVERT_SOFTCLIPPED_BASES is supposed to do. I end up using that as it's doing what I want. However, do you think it could be possible to add an option to change the base qualities of those bases that were reverted as we do not necessarily want to include them by using different BQ thresholds?

Thanks a lot!

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Hi there,

    Can you tell me what version you're using, and also post an example of what you're seeing with each option? I'd like to make sure we're distinguishing a potential problem with the documentation of the options from a bug in the tool's behavior.

    Geraldine Van der Auwera, PhD

  • JCGrenierJCGrenier Montreal, QCPosts: 13Member
    edited September 2013

    Hi,

    I'm using GATK version 2.4-9-g532efad. But I verified also with the very last one and it was doing the same thing.

    Here's an example of the 2 commands with an example of one read :

    Here's what the read looks like first :

    HWI-ST0860:271:H0K9KADXX:1:1114:4088:42468 137 101510.NC_008268 9948 2 67S31M1S = 9948 0 TGCGGCACCGCTACCACTCCACACCGTTGGCGTATCACCGTCGCAATTGGTACCGTCACCACCGTAGCCGCCACCGTGACCGCCACCGCCACCGCCGAT CCCFB@ADHHGFHHIIGGHIGIIIII;BDBGD?ACH@GIEHHEBBBBDAC;@>A>59=?CDDDB<A8?D@@DD>B98??@C<BB@BB@D><BDDDB@B@ MD:Z:9C0C20 RG:Z:AFRXG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:49 XS:i:47 YT:Z:UP

    Base command:

    java -Xmx1g -jar /home/apps/Logiciels/GATK/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -T ClipReads -l INFO -I test.bam -o test.hardclip.bam -R $PATH_TO_REF/RepGenomes.fa -CR $OPTION
    • OPTION = HARDCLIP_BASES

      HWI-ST0860:271:H0K9KADXX:1:1114:4088:42468 137 101510.NC_008268 9881 2 99M = 9948 0 TGCGGCACCGCTACCACTCCACACCGTTGGCGTATCACCGTCGCAATTGGTACCGTCACCACCGTAGCCGCCACCGTGACCGCCACCGCCACCGCCGAT CCCFB@ADHHGFHHIIGGHIGIIIII;BDBGD?ACH@GIEHHEBBBBDAC;@>A>59=?CDDDB<A8?D@@DD>B98??@C<BB@BB@D><BDDDB@B@ MD:Z:9C0C20 RG:Z:AFR XG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:49 XS:i:47 YT:Z:UP

    • OPTION = REVERT_SOFTCLIPPED_BASES

      HWI-ST0860:271:H0K9KADXX:1:1114:4088:42468 137 101510.NC_008268 9948 2 67S31M1S = 9948 0 TGCGGCACCGCTACCACTCCACACCGTTGGCGTATCACCGTCGCAATTGGTACCGTCACCACCGTAGCCGCCACCGTGACCGCCACCGCCACCGCCGAT CCCFB@ADHHGFHHIIGGHIGIIIII;BDBGD?ACH@GIEHHEBBBBDAC;@>A>59=?CDDDB<A8?D@@DD>B98??@C<BB@BB@D><BDDDB@B@ MD:Z:9C0C20 RG:Z:AFRXG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:49 XS:i:47 YT:Z:UP

    • OPTION = WRITE_NS_Q0S

      HWI-ST0860:271:H0K9KADXX:1:1114:4088:42468 137 101510.NC_008268 9948 2 67S31M1S = 9948 0 TGCGGCACCGCTACCACTCCACACCGTTGGCGTATCACCGTCGCAATTGGTACCGTCACCACCGTAGCCGCCACCGTGACCGCCACCGCCACCGCCGAT CCCFB@ADHHGFHHIIGGHIGIIIII;BDBGD?ACH@GIEHHEBBBBDAC;@>A>59=?CDDDB<A8?D@@DD>B98??@C<BB@BB@D><BDDDB@B@ MD:Z:9C0C20 RG:Z:AFRXG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:49 XS:i:47 YT:Z:UP

    So I guess the two last commands can't process pre-soft-clipped files?

    Thanks for your help.

    Post edited by Geraldine_VdAuwera on
  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    Hi JC,

    this tool was written a long time ago, and since it didn't get much use, it hasn't been updated in a while. That's not to say it doesn't do what it was written to do. But there are some caveats, I'll try to ellucidate here.

    • You are not asking to clip any bases from the sequence, so the behavior is correct. There are many ways to tell the tool how you want to clip bases, the simplest of all being the -CT option. That being said, all the outputs look right to me.
    • The option to explicitly revert soft clipped bases was never implemented in this tool (it is implemented internally in the code as an API for other tools that perform that action such as ReduceReads, thus it became visible through the documentation). This is very easy to add though, so I'll do this today.
    • The HARDCLIP_BASES action's first (necessary) step is to revert the softclipped bases so it is able to hard clip whatever tail you request without bumping into soft-clips. That's how it works. Since you are not requesting any base to be hard clipped, it's only reverting soft-clips.
  • JCGrenierJCGrenier Montreal, QCPosts: 13Member

    Thanks for your answer! It explains very much why I'm getting those results. Could this be possible to add the option, like I said before, to recode qualities of the reverted bases? It could be really helpful.

    Thanks a lot!

Sign In or Register to comment.