The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Montreal, QCPosts: 22Member

Hi folks,

I have a question concerning the ClipReads function and the different functionalities it's offering. The --clipRepresentation option and the different possibilites it offers doesn't seems to do what it's supposed to... As a matter of fact, I tried the REVERT_SOFTCLIPPED_BASES and HARDCLIP_BASES options and those seem unefficient while giving in input bam files coming from bwa.

HARDCLIP_BASES seems to do what REVERT_SOFTCLIPPED_BASES is supposed to do. I end up using that as it's doing what I want. However, do you think it could be possible to add an option to change the base qualities of those bases that were reverted as we do not necessarily want to include them by using different BQ thresholds?

Thanks a lot!

Tagged:

Hi there,

Can you tell me what version you're using, and also post an example of what you're seeing with each option? I'd like to make sure we're distinguishing a potential problem with the documentation of the options from a bug in the tool's behavior.

Geraldine Van der Auwera, PhD

• Montreal, QCPosts: 22Member
edited September 2013

Hi,

I'm using GATK version 2.4-9-g532efad. But I verified also with the very last one and it was doing the same thing.

Here's an example of the 2 commands with an example of one read :

Here's what the read looks like first :

HWI-ST0860:271:H0K9KADXX:1:1114:4088:42468 137 101510.NC_008268 9948 2 67S31M1S = 9948 0 TGCGGCACCGCTACCACTCCACACCGTTGGCGTATCACCGTCGCAATTGGTACCGTCACCACCGTAGCCGCCACCGTGACCGCCACCGCCACCGCCGAT CCCFB@ADHHGFHHIIGGHIGIIIII;BDBGD?ACH@GIEHHEBBBBDAC;@>A>59=?CDDDB<A8?D@@DD>B98??@C<BB@BB@D><BDDDB@B@ MD:Z:9C0C20 RG:Z:AFRXG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:49 XS:i:47 YT:Z:UP

Base command:

java -Xmx1g -jar /home/apps/Logiciels/GATK/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -T ClipReads -l INFO -I test.bam -o test.hardclip.bam -R $PATH_TO_REF/RepGenomes.fa -CR$OPTION

• OPTION = HARDCLIP_BASES

HWI-ST0860:271:H0K9KADXX:1:1114:4088:42468 137 101510.NC_008268 9881 2 99M = 9948 0 TGCGGCACCGCTACCACTCCACACCGTTGGCGTATCACCGTCGCAATTGGTACCGTCACCACCGTAGCCGCCACCGTGACCGCCACCGCCACCGCCGAT CCCFB@ADHHGFHHIIGGHIGIIIII;BDBGD?ACH@GIEHHEBBBBDAC;@>A>59=?CDDDB<A8?D@@DD>B98??@C<BB@BB@D><BDDDB@B@ MD:Z:9C0C20 RG:Z:AFR XG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:49 XS:i:47 YT:Z:UP

• OPTION = REVERT_SOFTCLIPPED_BASES

HWI-ST0860:271:H0K9KADXX:1:1114:4088:42468 137 101510.NC_008268 9948 2 67S31M1S = 9948 0 TGCGGCACCGCTACCACTCCACACCGTTGGCGTATCACCGTCGCAATTGGTACCGTCACCACCGTAGCCGCCACCGTGACCGCCACCGCCACCGCCGAT CCCFB@ADHHGFHHIIGGHIGIIIII;BDBGD?ACH@GIEHHEBBBBDAC;@>A>59=?CDDDB<A8?D@@DD>B98??@C<BB@BB@D><BDDDB@B@ MD:Z:9C0C20 RG:Z:AFRXG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:49 XS:i:47 YT:Z:UP

• OPTION = WRITE_NS_Q0S

HWI-ST0860:271:H0K9KADXX:1:1114:4088:42468 137 101510.NC_008268 9948 2 67S31M1S = 9948 0 TGCGGCACCGCTACCACTCCACACCGTTGGCGTATCACCGTCGCAATTGGTACCGTCACCACCGTAGCCGCCACCGTGACCGCCACCGCCACCGCCGAT CCCFB@ADHHGFHHIIGGHIGIIIII;BDBGD?ACH@GIEHHEBBBBDAC;@>A>59=?CDDDB<A8?D@@DD>B98??@C<BB@BB@D><BDDDB@B@ MD:Z:9C0C20 RG:Z:AFRXG:i:0 NM:i:2 XM:i:2 XN:i:0 XO:i:0 AS:i:49 XS:i:47 YT:Z:UP

So I guess the two last commands can't process pre-soft-clipped files?

Post edited by Geraldine_VdAuwera on

Hi JC,

this tool was written a long time ago, and since it didn't get much use, it hasn't been updated in a while. That's not to say it doesn't do what it was written to do. But there are some caveats, I'll try to ellucidate here.

• You are not asking to clip any bases from the sequence, so the behavior is correct. There are many ways to tell the tool how you want to clip bases, the simplest of all being the -CT option. That being said, all the outputs look right to me.
• The option to explicitly revert soft clipped bases was never implemented in this tool (it is implemented internally in the code as an API for other tools that perform that action such as ReduceReads, thus it became visible through the documentation). This is very easy to add though, so I'll do this today.
• The HARDCLIP_BASES action's first (necessary) step is to revert the softclipped bases so it is able to hard clip whatever tail you request without bumping into soft-clips. That's how it works. Since you are not requesting any base to be hard clipped, it's only reverting soft-clips.
• Montreal, QCPosts: 22Member

Thanks for your answer! It explains very much why I'm getting those results. Could this be possible to add the option, like I said before, to recode qualities of the reverted bases? It could be really helpful.

Thanks a lot!