Regular Expressions in Oracle

Regular Expressions (RE) are a powerful feature in Oracle. I have found them to be most useful for situations where you need to quickly find some very specific data, or for modifying (string or XML) data that would normally have required many lines of various Oracle function calls and conditional checks, or were simply impossible to do with 1 query statement.

No doubt that REs can make your code more compact, but on the other hand they should also be avoided in queries that are used on a regular basis in a production system. REs can make your queries slower, especially on large tables or when the actual expression becomes more complicated. So use them with caution within an application. However, for reasons such as troubleshooting, one-time queries or updates, etc they could prove extremely useful.

Next RE functions can be used directly in Oracle queries:

  • REGEXP_LIKE
  • REGEXP_INSTR
  • REGEXP_SUBSTR
  • REGEXP_REPLACE

These are basically the functions LIKE, INSTR, SUBSTR and REPLACE put on steroids.

Here are some examples. But first, create a table with some test data in it:

create table testtable (recid number, val varchar2(1000));
insert into testtable values (1, 'the dog barks');
insert into testtable values (2, 'the cat does not bark');
insert into testtable values (3, 'the dog sleeps');
insert into testtable values (4, 'the cat sleeps');

Two different non-RE ways to find data that contains the word ‘dog’:

select * from testtable where val like '%dog%';
select * from testtable where instr(val, 'dog') > 0;

Two different RE ways to find data that contains the word ‘dog’:

select * from testtable where regexp_instr(val, 'dog') > 0;
select * from testtable where regexp_substr(val, 'dog')='dog';

The solution with the like clause seems to be the quickest, even faster than the instr solution. Generally, the use of non-RE functions is recommended above the RE based, if possible. In this particular case, the RE functions are a bit of overkill for such kind of simple string matches.

Now let’s select the records that contain the words ‘cat’ or ‘dog’ but not ‘sleep’:

select * from testtable
where val not like '%sleep%'
  and regexp_instr(val, 'cat|dog') > 0;

The next update replaces the word ‘the’ at the beginning of each record, by the word ‘a’:

update testtable
set val = regexp_replace(val, '^the ', 'a ');

The data now looks like this:

a dog barks
a cat does not bark
a dog sleeps
a cat sleeps

This update places the last word of each record between quotes (“):

update testtable
set val = regexp_replace(val, ' ([^ ]+)$', ' "\1"');

The new data looks like this:

a dog "barks"
a cat does not "bark"
a dog "sleeps"
a cat "sleeps"

REs can also be useful for searching within XML data. This could be useful if you want
to avoid using the DOM functions in Oracle, as this tends to require more coding and
several extra function calls.

First create a table that can hold XMLType objects (these are basically CLOBs):

create table xmltable (recid number, xml xmltype);

Insert a row with a small XML structure:

insert into xmltable values (1, '<?xml version="1.0"?>
<products>
  <item type="book" price="25">title 1</item>
  <item type="cd" price="22">michael jackson</item>
  <item type="book" price="18">title 2</item>
</products>');

The next query will select the first occurrence of the node that contains a
TextNode with the value ‘jackson’ anywhere in it. The match is case-insensitive.

select regexp_substr(XMLType.getClobVal(xml),
         '<item[^>]*>[^<]*JACKSON[^<]*<[^>]*>',
         1,1,'in')
from xmltable;

This will return next line:

<item type="cd" price="22">michael jackson</item>

Leave a Reply

Your email address will not be published. Required fields are marked *

*