Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Modification of Strings

Hi,

My workflow gets a directory as input which is named like this: [Date][FlowCell][...]_[...]/ To do the mapping with bwa mem I want to extract some of those blocks to use them as part of my readgroup. To do so I use #, ##, % and %% known from shell scripts.

Example: ID="${inDirectory#*_}"

When I validate my wdl script wdltool returns:

Unrecognized token on line 29, column 18:
ID="${inDirectory#*_}"
^
The # character seems to be the problem. Is it at all possible to work with Strings the way I do it in shell scripts? Is there another way to extract information from strings? I try to avoid packing my wdl script inside of a shell script.

Tagged:

Best Answer

  • KateNKateN Cambridge, MA admin
    Accepted Answer

    You should be able to use python's find() method iteratively to separate out each block you need. To use python in your command section, you simply have to bracket the section like so:

    command {
      python <<CODE
        { insert python code here }
      CODE
    }
    

Answers

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    There are a few ways you can work with string variables in WDL, but I am having a hard time understanding what you intend the ID="${inDirectory#*_}" line to accomplish. Could you post your WDL script so I can see it in context? Alternatively, you could give an example of how you would want that line interpreted, given a specific directory input.

  • dbeckerdbecker MunichMember ✭✭✭

    Thanks for the quick reply.

    The # Operator removes the smalles matching part from the left side of the String. %% as another example removes the largest matching part from the right side.
    For example:

    inDirectory="Block1_Block2_Block3_Block4/"
    ID="${inDirectory#*_}"
    echo $ID

    Block2_Block3_Block4/

    ID="${ID%%_*}"
    echo $ID

    Block2

    I have the directory which is automatically created by our NextSeq. The name of the directory contains various information seperated by "_". The InstrumentID of the Sequencer is given in the second block, should be part of my Readgroup tag and I want to extract it. How can I accomplish that?

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin
    Accepted Answer

    You should be able to use python's find() method iteratively to separate out each block you need. To use python in your command section, you simply have to bracket the section like so:

    command {
      python <<CODE
        { insert python code here }
      CODE
    }
    
Sign In or Register to comment.