Purpose
This function is used to tokenize text based on a specified tokenizer and JSON-formatted parameters, returning the tokenization results.
Syntax
TOKENIZE('text', ['parser'], ['behavior_ctrl'])
Parameters
| Field | Description |
|---|---|
| text | The text to be tokenized. Supports TEXT, CHAR, and VARCHAR data types. |
| parser | The name of the tokenizer. Supports BENG (basic English), NGRAM (Chinese), SPACE (space), and IK (Chinese) tokenizers.
NoteFor OceanBase Database V4.3.5, support for the |
| behavior_ctrl | JSON-formatted parameters for optional configurations. The supported options are as follows:
|
Examples
Use the TOKENIZE function to split the string I Love China into words, using beng as the delimiter. Then, use JSON-formatted parameters to set the output options.
SELECT TOKENIZE('I Love China','beng', '[{"output": "all"}]');
The result is as follows:
+--------------------------------------------------------+
| TOKENIZE('I Love China','beng', '[{"output": "all"}]') |
+--------------------------------------------------------+
| {"tokens": [{"love": 1}, {"china": 1}], "doc_len": 2} |
+--------------------------------------------------------+
1 row in set
