Full-text search refers to the operation of searching for text data based on keywords, phrases, or text expressions. It allows for comprehensive searches of the entire text and returns results that match the search criteria.
Syntax
When you perform a full-text index query, you can specify the columns and keywords or phrases to search for, and you can optionally specify search modifiers to adjust the search mode (that is, the search conditions or rules).
MATCH (column_name [, column_name ...]) AGAINST (expr [search_modifier])
search_modifier:
IN NATURAL LANGUAGE MODE
| IN BOOLEAN MODE
| IN MATCH PHRASE MODE
The following table describes the related parameters:
column_name: the column to be searched. Separate multiple columns with commas (,).expr: the keyword or phrase to be searched for.search_modifier: an optional parameter that specifies the search mode. The following table describes the values and applicable scenarios of this parameter:Search mode Parameter description Applicable scenarios IN NATURAL LANGUAGE MODE The default value. Specifies to use the natural language search mode. Natural language search, topk scenarios, fuzzy matching, and relevance output, where words are tokenized based on the tokenizer set in the table. IN BOOLEAN MODE Note
For OceanBase Database V4.3.5,
IN BOOLEAN MODEis supported starting from V4.3.5 BP1.Specifies to use the boolean mode. The following table describes the supported boolean operators and nested operations in the current version: +: representsAND, the intersection.-: represents NOT, the difference.- Without an operator: when used alone, it represents
OR, the union. For example,A BrepresentsA OR B. When used with other operators, it indicates the presence of the specified terms but loses theORsemantics. For example,+A Bmeans that A must be present, and the relevance of A and B in the sentence is calculated. (): represents nested operations. When used without an operator, it has theORsemantics. For example,+A (nested clause)means that A or the nested clause must be present.
Semantic filtering, exact matching, and filtering based on user output syntax. You can choose to output relevance, where each word is tokenized based on spaces and +,-, and().IN MATCH PHRASE MODE Note
For OceanBase Database V4.4.0 and later,
IN MATCH PHRASE MODEis supported.Specifies to use the phrase query mode. Phrase matching, where no tokenization is performed, and the query results must exactly match the specified phrase. Here are some examples:
The output must contain the word "computer".
obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE);The output must contain the word "computer" and must not contain the word "weather".
obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE);The output must contain the word "computer", and "oceanbase" is more relevant.
obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);Perform a phrase query.
Use
MATCH AGAINSTas theSELECTfield.obclient> SELECT id, MATCH (title, body) AGAINST ('some words' IN MATCH PHRASE MODE) AS score FROM test;Use
MATCH AGAINSTas theWHEREfield.obclient> SELECT * FROM test WHERE MATCH (title, body) AGAINST ('some words' IN MATCH PHRASE MODE);
For more information about the MATCH AGAINST expression, see MATCH AGAINST.
Vectorized query
For queries that contain full-text indexes, you can choose to use vectorized or non-vectorized queries by using the /*+ opt_param('rowsets_enabled', '[true | false]')*/ hint to enable or disable vectorization.
Notice
If you do not specify the hint, whether vectorization is enabled depends on the system parameter configuration of the vectorization engine. By default, vectorization is enabled.
Here are some examples:
Enable vectorization.
obclient> SELECT /*+ opt_param('rowsets_enabled', 'true') */ title, body FROM articles WHERE MATCH(title, body) AGAINST('tutorial');Disable vectorization.
obclient> SELECT /*+ opt_param('rowsets_enabled', 'false') */ title, body FROM articles WHERE MATCH(title, body) AGAINST('tutorial');
Examples
Create a table named
tbl1and a full-text index namedfull_idx1_tbl1.obclient> CREATE TABLE tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(100), col3 TEXT, FULLTEXT INDEX full_idx1_tbl1(col2, col3));Add test data to the
tbl1table.obclient> INSERT INTO tbl1 (col1, col2, col3) VALUES (1, 'Hello World', 'This is a test'), (2, 'OceanBase', 'OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team'), (3, 'Database Management', 'Learn about SQL and database administration'), (4, 'Full Text Searching', 'Master the art of full text searching');The return result is as follows:
Query OK, 4 rows affected Records: 4 Duplicates: 0 Warnings: 0Search for the keyword 'OceanBase' in the
col2andcol3columns of thetbl1table by using theMATCHclause. TheIN NATURAL LANGUAGE MODEsearch modifier is used to specify the natural language search mode.obclient> SELECT * FROM tbl1 WHERE MATCH (col2, col3) AGAINST ('OceanBase' IN NATURAL LANGUAGE MODE);The return result is as follows:
+------+-----------+---------------------------------------------------------------------------------------------------------------------+ | col1 | col2 | col3 | +------+-----------+---------------------------------------------------------------------------------------------------------------------+ | 2 | OceanBase | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team | +------+-----------+---------------------------------------------------------------------------------------------------------------------+ 1 row in set
Full-text search with Elasticsearch
OceanBase Database supports the SQL syntax for full-text search with Elasticsearch, including the MATCH() and SCORE() statements, starting from V4.4.1.
The
MATCH()statement is used as a field in theWHEREclause to specify the filtering semantics. It must be used in conjunction with theSCORE()statement.Notice
Currently, only one
MATCH()statement can be used in theWHEREclause, and it cannot be mixed withMATCH() AGAINST().The
SCORE()statement indicates the relevance of the correspondingMATCH()statement. It is used in conjunction with theMATCH()statement and has no meaning on its own.
The syntax for full-text search with Elasticsearch is as follows:
SELECT [select_expr_list ,] SCORE()
FROM table_name
WHERE
MATCH(
A_COLUNM^x, B_COLUNM^y ...,
'expr',
'[parameters]'
)
... ;
parameters:
operator=or;
minimum_should_match=int_value;
boost=number_value;
score_norm=min-max;
type={most_fields | best_fields}
The following table describes the related parameters:
A_COLUNM^x, B_COLUNM^y ...: The column names and weights. The weights must be positive numbers.Notice
Each column in the
MATCH()statement must have a full-text index, not a composite index. For more information about creating a full-text index, see Create an index.expr: The query statement, which is divided into several tokens. Each token can contain a weight. The weight must be a positive number, with a default value of 1.parameters: Optional. A list of key-value pairs. If no additional parameters are specified, a pair of empty quotes''must be retained. The following parameters are supported:operator: The aggregation algorithm for theexprparameter. Valid values:or: A result is returned if at least one token is matched. If theminimum_should_matchparameter is specified, the returned results are determined by theminimum_should_matchparameter.
minimum_should_match: A positive integer. If theoperatorparameter is set toor, this parameter specifies the minimum number of keywords that must be matched in the token group.boost: A positive number. The weight at theMATCHlevel, which is multiplied by the relevance. Default value: 1.score_norm: The normalization algorithm for aMATCHstatement during scoring. Valid value:min-max. Default value: no normalization.type: The relevance calculation method for multiple columns in aMATCHstatement. Valid values:most_fields: The relevance of each column is summed.best_fields: The maximum relevance among the columns is taken.
Here are some examples:
Create a full-text index on the
col2column of thetbl1table.obclient> CREATE FULLTEXT INDEX ft_idx_tbl1_col2 ON tbl1(col2);Create a full-text index on the
col3column of thetbl1table.obclient> CREATE FULLTEXT INDEX ft_idx_tbl1_col3 ON tbl1(col3);Query the
tbl1table for rows that contain both "OceanBase" and "Database" (at least 2 keywords matched,minimum_should_match=2), and perform a full-text search on thecol2column (weight 2) andcol3column (weight 1). Return the matching scores (normalized to the 0-1 range).obclient> SELECT *, SCORE() AS relevance_score FROM tbl1 WHERE MATCH(col2^2, col3^1, 'OceanBase Database', 'operator=or; minimum_should_match=2; boost=1; score_norm=min-max; type=most_fields' );The query result is as follows:
+------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+ | col1 | col2 | col3 | relevance_score | +------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+ | 2 | OceanBase | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team | 0.6079027355623101 | +------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+ 1 row in setQuery the
tbl1table for rows that contain either "OceanBase" or "Database" (at least 1 keyword matched,minimum_should_match=1), and perform a full-text search on thecol2column (weight 2) andcol3column (weight 1). Return the matching scores (normalized to the 0-1 range).obclient> SELECT *, SCORE() AS relevance_score FROM tbl1 WHERE MATCH(col2^2, col3^1, 'OceanBase Database', 'operator=or; minimum_should_match=1; boost=1; score_norm=min-max; type=most_fields' );The query result is as follows:
+------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+ | col1 | col2 | col3 | relevance_score | +------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+ | 2 | OceanBase | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team | 0.4208104883442219 | | 3 | Database Management | Learn about SQL and database administration | 0.2876526910145043 | +------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+ 2 rows in set
