| description | Guide and best practices for using regular expressions in OceanBase Database's MySQL-compatible mode queries |
|---|---|
| keywords | regular expression, regexp, rlike, query, string matching |
| dir-name | |
| dir-name-en | |
| tenant-type | MySQL Mode |
A regular expression is a pattern-matching tool used to search for and manipulate text that matches specific patterns in strings. OceanBase Database is fully compatible with MySQL 8.0 regular expression functionality since V4.1.0 and supports the extended regular expression syntax of the POSIX 1003.2 standard.
OceanBase Database uses the ICU regular expression engine, which provides high-performance regular expression matching and manipulation. It also supports the optional Hyperscan regular expression engine for better performance.
Regular expression functions and operators
OceanBase Dtabase's MySQL-compatible mode provides the following regular expression-related functions and operators:
| Name | Description |
|---|---|
| REGEXP | Whether the string matches the regular expression |
| NOT REGEXP | Whether the string does not match the regular expression |
| RLIKE | Synonym for REGEXP |
| REGEXP_LIKE | Regular expression matching function |
| REGEXP_INSTR | Returns the starting position of the matching substring |
| REGEXP_REPLACE | Regular expression replacement |
| REGEXP_SUBSTR | Returns the matching substring |
Basic syntax
Simple matching
-- Basic matching
SELECT 'hello world' REGEXP 'hello'; -- Returns 1 (match)
SELECT 'hello world' REGEXP 'goodbye'; -- Returns 0 (no match)
-- Using RLIKE
SELECT 'hello world' RLIKE '^hello'; -- Returns 1 (starts with hello)
Character classes and quantifiers
-- Dot matches any character
SELECT 'abc' REGEXP 'a.c'; -- Returns 1
-- Character class [abc] matches a, b, or c
SELECT 'bat' REGEXP '[abc]at'; -- Returns 1
-- Quantifiers * (zero or more), + (one or more), ? (zero or one)
SELECT 'caaat' REGEXP 'ca*t'; -- Returns 1 (a occurs zero or more times)
SELECT 'caaat' REGEXP 'ca+t'; -- Returns 1 (a occurs one or more times)
SELECT 'cat' REGEXP 'ca?t'; -- Returns 1 (a occurs zero or one time)
Anchors
-- ^ matches the beginning of the string
SELECT 'hello world' REGEXP '^hello'; -- Returns 1
-- $ matches the end of the string
SELECT 'hello world' REGEXP 'world$'; -- Returns 1
-- \b matches word boundary
SELECT 'hello world' REGEXP 'world\b'; -- Returns 1
Grouping and references
-- Parentheses () for grouping
SELECT 'abab' REGEXP '(ab)+'; -- Returns 1
-- Backreferences \1, \2, etc.
SELECT 'abab' REGEXP '(ab)\1'; -- Returns 1
Character classes
Standard character classes
| Character class | Meaning |
|---|---|
[[:alnum:]] |
Alphanumeric characters |
[[:alpha:]] |
Alphabetic characters |
[[:blank:]] |
Blank characters (space and tab) |
[[:cntrl:]] |
Control characters |
[[:digit:]] |
Digit characters |
[[:graph:]] |
Printable characters (excluding space) |
[[:lower:]] |
Lowercase letters |
[[:print:]] |
Printable characters (including space) |
[[:punct:]] |
Punctuation characters |
[[:space:]] |
Whitespace characters (including newline, carriage return, etc.) |
[[:upper:]] |
Uppercase letters |
[[:xdigit:]] |
Hexadecimal digits |
-- Using character classes
SELECT 'abc123' REGEXP '[[:alnum:]]+'; -- Returns 1
SELECT 'hello' REGEXP '[[:alpha:]]+'; -- Returns 1
SELECT '123' REGEXP '[[:digit:]]+'; -- Returns 1
Special character classes
-- POSIX character classes
SELECT 'a1' REGEXP '[[:alnum:]]'; -- Returns 1
SELECT 'A' REGEXP '[[:upper:]]'; -- Returns 1
SELECT 'a' REGEXP '[[:lower:]]'; -- Returns 1
-- Equivalence class [=character=]
SELECT 'a' REGEXP '[[=a=]]'; -- Returns 1
-- Collating element [.character.]
SELECT '.' REGEXP '[[..]]'; -- Returns 1
Escape characters
In regular expressions, certain characters have special meanings. To match these characters literally, you need to escape them with a backslash \.
| Special character | Escape sequence |
|---|---|
. |
\. |
* |
\* |
+ |
\+ |
? |
\? |
^ |
\^ |
$ |
\$ |
( |
\( |
) |
\) |
[ |
\[ |
] |
\] |
{ |
\{ |
} |
\} |
| |
\| |
\ |
\\ |
-- Escape examples
SELECT 'a.c' REGEXP 'a\.c'; -- Returns 1 (matches literal dot)
SELECT 'a*c' REGEXP 'a\*c'; -- Returns 1 (matches literal asterisk)
SELECT 'a+c' REGEXP 'a\+c'; -- Returns 0 (matches literal plus sign)
SELECT 'price: $100' REGEXP '\\$'; -- Returns 1 (matches dollar sign)
Character set and collation
Regular expression operations determine matching behavior and comparison based on the character set and collation of the input string and regular expression pattern parameters:
- If the parameters have different character sets or collations, coercion rules are applied
- If any parameter is a binary string, the parameters are processed as binary strings in a case-sensitive manner
- Supports multiple character sets such as UTF-8 and UTF-16
-- Character set example
SELECT 'OceanBase' REGEXP 'oceanbase'; -- Returns 0 (case-sensitive)
-- Binary comparison example (requires character set with binary collation)
SELECT _utf8mb4'OceanBase' COLLATE utf8mb4_bin REGEXP _utf8mb4'oceanbase' COLLATE utf8mb4_bin; -- Returns 0
Notice
OceanBase Database does not support using the BINARY keyword directly in the REGEXP function. If you need to perform binary comparison, use the COLLATE clause to specify a character set with binary collation.
Performance optimization
OceanBase Database's MySQL-compatible mode provides multiple performance optimizations for regular expressions:
- Expression caching: Compiled regular expressions can be cached and reused
- Constant optimization: Fast computation path for regular expressions with constant patterns
- Memory management: Efficient temporary memory allocation and management
Notice
Complex regular expressions may affect query performance. Use them with caution in high-concurrency scenarios and consider using database indexes to optimize query performance.
Example application scenarios
Data extraction
-- Extract numbers from a string
SELECT REGEXP_SUBSTR('Price: $123.45', '\\d+\\.\\d+') AS price;
-- Extract domain from URL
SELECT REGEXP_SUBSTR('https://www.example.com/path', 'https?://([^/]+)') AS domain;
Data cleaning
-- Remove extra whitespace
SELECT REGEXP_REPLACE('hello world', '\\s+', ' ') AS cleaned_text;
-- Format phone number
SELECT REGEXP_REPLACE('12345678901', '(\\d{3})(\\d{4})(\\d{4})', '$1-$2-$3') AS formatted_phone;