Description
The function takes a text string and a tokenizer as input, and returns the tokenized result in JSON format.
Syntax
TOKENIZE('text', ['parser'], ['behavior_ctrl'])
Parameters
| Parameter | Description |
|---|---|
| text | The text to be tokenized. Valid data types are TEXT, CHAR, and VARCHAR. |
| parser | The tokenizer name. Valid values include BENG (basic English), NGRAM (Chinese), SPACE (space), and IK (Chinese).
NoteFor OceanBase Database V4.3.5, the |
| behavior_ctrl | JSON-formatted parameters. Valid values include:
|
Examples
The following example tokenizes the string I Love China by using the beng tokenizer and sets the output options in JSON format.
SELECT TOKENIZE('I Love China','beng', '[{"output": "all"}]');
The return result is as follows:
+--------------------------------------------------------+
| TOKENIZE('I Love China','beng', '[{"output": "all"}]') |
+--------------------------------------------------------+
| {"tokens": [{"love": 1}, {"china": 1}], "doc_len": 2} |
+--------------------------------------------------------+
1 row in set