Proper regex to mark duplicates using Picard tools on SOLiD data
I'm having trouble removing duplicates using Picard tools on SOLiD data. I get a regex not matching error.
The reads have the following names:
And I don't think Picard tools is able to pick these read names with its default regex.
I tried to change the default regex. This time it does not throw an error, but it takes too long and times out (out of memory). I suspect I'm not giving the right regex. Here is my command:
java -jar $PICARD_TOOLS_HOME/MarkDuplicates.jar I=$FILE O=$BAMs/MarkDuplicates/$SAMPLE.MD.bam M=$BAMs/MarkDuplicates/$SAMPLE.metrics READ_NAME_REGEX="([0-9]+)([0-9]+)([0-9]+).*"
Any help is appreciated. Thanks!