Full-text search refers to the operation of performing a full-text search or retrieval on text data. It is used to find text content that contains specific keywords, phrases, or text expressions. Full-text search allows for a more comprehensive search of the entire text and returns results that match the search criteria.
Syntax
When you perform a full-text search, you can specify columns and keywords or phrases to search for. You can also use specific search modifiers to adjust the search pattern (that is, the search conditions or rules).
MATCH (column_name [, column_name ...]) AGAINST (expr [search_modifier])
search_modifier:
IN NATURAL LANGUAGE MODE
| IN BOOLEAN MODE
| IN MATCH PHRASE MODE
The following table describes the related parameters:
column_name: specifies the column to perform the full-text search on. If you want to specify multiple columns, separate them with commas.expr: specifies the keyword or phrase to search for.search_modifier: specifies the search pattern. The following table describes the values and applicable scenarios of this parameter:Search pattern Parameter description Applicable scenarios IN NATURAL LANGUAGE MODE The default value. Specifies to use the natural language search pattern. Natural language, topk scenarios, fuzzy matching, and relevance output, where words are segmented by the default tokenizer set in the table. IN BOOLEAN MODE Note
For OceanBase Database V4.3.5,
IN BOOLEAN MODEis supported starting from V4.3.5 BP1.Specifies to use the Boolean search pattern. The following table describes the Boolean operators and nested operations supported in the current version: +: representsAND, which is the intersection of sets.-: represents negation, which is the difference between sets.- When no operator is specified, it represents
OR, which is the union of sets. For example,A BrepresentsA OR B. When operators are mixed with theORoperator, the relevance of the sentences increases, but theORoperator is lost. For example,+A Bmeans that A must be present, and the relevance of A and B in the sentence is calculated. (): represents nested operations. When no operator is specified, it represents theORoperator. For example,+A (nested clause)means that A or the nested clause must be present.
Semantic filtering, exact matching, filtering based on user output syntax, and relevance output, where each word is segmented only by spaces, +,-, and().IN MATCH PHRASE MODE Note
For OceanBase Database V4.4.0 and later,
IN MATCH PHRASE MODEis supported.Specifies to use the Phrase Query (short phrase query) pattern. Phrase matching, no segmentation, and query results that must exactly match the query. Here are some examples:
The output must contain the word "computer".
obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE);The output must contain the word "computer" and must not contain the word "weather".
obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE);The output must contain the word "computer", and "oceanbase" is more relevant.
obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);Use the phrase matching pattern for the query.
Use
MATCH AGAINSTas theSELECTclause.obclient> SELECT id, MATCH (title, body) AGAINST ('some words' IN MATCH PHRASE MODE) AS score FROM test;Use
MATCH AGAINSTas theWHEREclause.obclient> SELECT * FROM test WHERE MATCH (title, body) AGAINST ('some words' IN MATCH PHRASE MODE);
For more information about the MATCH AGAINST expression, see MATCH AGAINST.
Vectorized query
For queries involving full-text indexes, you can choose to use vectorized or non-vectorized execution. You can use the /*+ opt_param('rowsets_enabled', '[true | false]')*/ hint to enable or disable vectorization.
Notice
If you do not specify the hint, whether vectorization is enabled depends on the system parameter configuration of the vectorization engine. By default, vectorization is enabled.
Here are some examples:
Enable vectorization.
obclient> SELECT /*+ opt_param('rowsets_enabled', 'true') */ title, body FROM articles WHERE MATCH(title, body) AGAINST('tutorial');Disable vectorization.
obclient> SELECT /*+ opt_param('rowsets_enabled', 'false') */ title, body FROM articles WHERE MATCH(title, body) AGAINST('tutorial');
Examples
Create a table named
tbl1and create a full-text index namedfull_idx1_tbl1on the table.obclient> CREATE TABLE tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(100), col3 TEXT, FULLTEXT INDEX full_idx1_tbl1(col2, col3));Add test data to the
tbl1table.obclient> INSERT INTO tbl1 (col1, col2, col3) VALUES (1, 'Hello World', 'This is a test'), (2, 'OceanBase', 'OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team'), (3, 'Database Management', 'Learn about SQL and database administration'), (4, 'Full Text Searching', 'Master the art of full text searching');The return result is as follows:
Query OK, 4 rows affected Records: 4 Duplicates: 0 Warnings: 0Search for the keyword 'OceanBase' in the
col2andcol3columns of thetbl1table. Use theIN NATURAL LANGUAGE MODEsearch modifier to specify that the natural language search pattern should be used.obclient> SELECT * FROM tbl1 WHERE MATCH (col2, col3) AGAINST ('OceanBase' IN NATURAL LANGUAGE MODE);The return result is as follows:
+------+-----------+---------------------------------------------------------------------------------------------------------------------+ | col1 | col2 | col3 | +------+-----------+---------------------------------------------------------------------------------------------------------------------+ | 2 | OceanBase | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team | +------+-----------+---------------------------------------------------------------------------------------------------------------------+ 1 row in set
Full-text search in Elasticsearch
OceanBase Database supports the full-text search syntax in Elasticsearch, including the MATCH() and SCORE() statements, starting from V4.4.1.
The
MATCH()statement is used as a field after theWHEREclause to specify the filtering semantics. It must be used in conjunction with theSCORE()statement.Notice
In the
WHEREclause, only oneMATCH()statement can be used, and it cannot be mixed with theMATCH() AGAINST()statement.The
SCORE()statement indicates the relevance of the correspondingMATCH()statement. TheSCORE()statement is used in conjunction with theMATCH()statement and has no meaning on its own.
The syntax for full-text search in Elasticsearch is as follows:
SELECT [select_expr_list ,] SCORE()
FROM table_name
WHERE
MATCH(
A_COLUNM^x, B_COLUNM^y ...,
'expr',
'[parameters]'
)
... ;
parameters:
operator=or;
minimum_should_match=int_value;
boost=number_value;
score_norm=min-max;
type={most_fields | best_fields}
The following table describes the related parameters:
A_COLUNM^x, B_COLUNM^y ...: specifies the column names and weights. The weights must be positive numbers.Notice
In the
MATCH()statement, a full-text index must be created for each column, not for a combined index of multiple columns. For more information about how to create a full-text index, see Create an index.expr: specifies the query statement. It is divided into several tokens, and each token can have a weight. The weight must be a positive number, and the default value is 1.parameters: an optional parameter that specifies a list of key-value pairs. If no additional parameters are specified, you must retain a pair of empty quotes''. The following parameters are available:operator: specifies the aggregation algorithm for theexprparameter. Valid values:or: indicates that the output is generated if at least one token is matched. If theminimum_should_matchparameter is specified, the returned results are determined by the value of theminimum_should_matchparameter.
minimum_should_match: specifies the minimum number of keywords that must be matched in the token group when theoperatorparameter is set toor. The value must be a positive integer.boost: specifies the weight at theMATCHlevel, which is multiplied by the relevance. The default value is 1.score_norm: specifies the normalization algorithm for aMATCHstatement during scoring. Valid value:min-max. The default value is not to normalize.type: specifies the relevance calculation method for multiple columns in aMATCHstatement. Valid values:most_fields: indicates that the relevances of all columns are summed.best_fields: indicates that the maximum relevance among all columns is taken.
Here are some examples:
Create a full-text index on the
col2column of thetbl1table.obclient> CREATE FULLTEXT INDEX ft_idx_tbl1_col2 ON tbl1(col2);Create a full-text index on the
col3column of thetbl1table.obclient> CREATE FULLTEXT INDEX ft_idx_tbl1_col3 ON tbl1(col3);Query the
tbl1table for rows that contain both "OceanBase" and "Database" (at least 2 keywords matched,minimum_should_match=2), and perform a full-text search on thecol2column (weight 2) andcol3column (weight 1). Return the matching scores (normalized to the range of 0-1).obclient> SELECT *, SCORE() AS relevance_score FROM tbl1 WHERE MATCH(col2^2, col3^1, 'OceanBase Database', 'operator=or; minimum_should_match=2; boost=1; score_norm=min-max; type=most_fields' );The returned result is as follows:
+------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+ | col1 | col2 | col3 | relevance_score | +------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+ | 2 | OceanBase | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team | 0.6079027355623101 | +------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+ 1 row in setQuery the
tbl1table for rows that contain either "OceanBase" or "Database" (at least 1 keyword matched,minimum_should_match=1), and perform a full-text search on thecol2column (weight 2) andcol3column (weight 1). Return the matching scores (normalized to the range of 0-1).obclient> SELECT *, SCORE() AS relevance_score FROM tbl1 WHERE MATCH(col2^2, col3^1, 'OceanBase Database', 'operator=or; minimum_should_match=1; boost=1; score_norm=min-max; type=most_fields' );The returned result is as follows:
+------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+ | col1 | col2 | col3 | relevance_score | +------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+ | 2 | OceanBase | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team | 0.4208104883442219 | | 3 | Database Management | Learn about SQL and database administration | 0.2876526910145043 | +------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+ 2 rows in set