Full-text search|V4.4.2|OceanBase Database| docs|Distributed Database

Full-text search refers to the operation of performing a full-text search or retrieval on text data. It is used to find text content that contains specific keywords, phrases, or text expressions. Full-text search allows for a more comprehensive search of the entire text and returns results that match the search criteria.

Syntax

When you perform a full-text search, you can specify columns and keywords or phrases to search for. You can also use specific search modifiers to adjust the search pattern (that is, the search conditions or rules).

MATCH (column_name [, column_name ...]) AGAINST (expr [search_modifier])

search_modifier:
    IN NATURAL LANGUAGE MODE
    | IN BOOLEAN MODE
    | IN MATCH PHRASE MODE

The following table describes the related parameters:

column_name: specifies the column to perform the full-text search on. If you want to specify multiple columns, separate them with commas.
expr: specifies the keyword or phrase to search for.

search_modifier: specifies the search pattern. The following table describes the values and applicable scenarios of this parameter:

Search pattern	Parameter description	Applicable scenarios
IN NATURAL LANGUAGE MODE	The default value. Specifies to use the natural language search pattern.	Natural language, topk scenarios, fuzzy matching, and relevance output, where words are segmented by the default tokenizer set in the table.
IN BOOLEAN MODE Note For OceanBase Database V4.3.5, `IN BOOLEAN MODE` is supported starting from V4.3.5 BP1.	Specifies to use the Boolean search pattern. The following table describes the Boolean operators and nested operations supported in the current version: `+`: represents `AND`, which is the intersection of sets. `-`: represents negation, which is the difference between sets. When no operator is specified, it represents `OR`, which is the union of sets. For example, `A B` represents `A OR B`. When operators are mixed with the `OR` operator, the relevance of the sentences increases, but the `OR` operator is lost. For example, `+A B` means that A must be present, and the relevance of A and B in the sentence is calculated. `()`: represents nested operations. When no operator is specified, it represents the `OR` operator. For example, `+A (nested clause)` means that A or the nested clause must be present.	Semantic filtering, exact matching, filtering based on user output syntax, and relevance output, where each word is segmented only by spaces, `+`, `-`, and `()`.
IN MATCH PHRASE MODE Note For OceanBase Database V4.4.0 and later, `IN MATCH PHRASE MODE` is supported.	Specifies to use the Phrase Query (short phrase query) pattern.	Phrase matching, no segmentation, and query results that must exactly match the query.

Here are some examples:

The output must contain the word "computer".

obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE);

The output must contain the word "computer" and must not contain the word "weather".

obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE);

The output must contain the word "computer", and "oceanbase" is more relevant.

obclient> SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);

Use the phrase matching pattern for the query.

Use MATCH AGAINST as the SELECT clause.

obclient> SELECT id, MATCH (title, body) AGAINST ('some words' IN MATCH PHRASE MODE) AS score
    FROM test;

Use MATCH AGAINST as the WHERE clause.

obclient> SELECT * FROM test WHERE MATCH (title, body) AGAINST ('some words' IN MATCH PHRASE MODE);

For more information about the MATCH AGAINST expression, see MATCH AGAINST.

Vectorized query

For queries involving full-text indexes, you can choose to use vectorized or non-vectorized execution. You can use the /*+ opt_param('rowsets_enabled', '[true | false]')*/ hint to enable or disable vectorization.

Notice

If you do not specify the hint, whether vectorization is enabled depends on the system parameter configuration of the vectorization engine. By default, vectorization is enabled.

Here are some examples:

Enable vectorization.

obclient> SELECT /*+ opt_param('rowsets_enabled', 'true') */ title, body
    FROM articles
    WHERE MATCH(title, body) AGAINST('tutorial');

Disable vectorization.

obclient> SELECT /*+ opt_param('rowsets_enabled', 'false') */ title, body
    FROM articles
    WHERE MATCH(title, body) AGAINST('tutorial');

Examples

Create a table named tbl1 and create a full-text index named full_idx1_tbl1 on the table.

obclient> CREATE TABLE tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(100), col3 TEXT, FULLTEXT INDEX full_idx1_tbl1(col2, col3));

Add test data to the tbl1 table.

obclient> INSERT INTO tbl1 (col1, col2, col3) VALUES (1, 'Hello World', 'This is a test'),
    (2, 'OceanBase', 'OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team'),
    (3, 'Database Management', 'Learn about SQL and database administration'),
    (4, 'Full Text Searching', 'Master the art of full text searching');

The return result is as follows:

Query OK, 4 rows affected
Records: 4  Duplicates: 0  Warnings: 0

Search for the keyword 'OceanBase' in the col2 and col3 columns of the tbl1 table. Use the IN NATURAL LANGUAGE MODE search modifier to specify that the natural language search pattern should be used.

obclient> SELECT * FROM tbl1
    WHERE MATCH (col2, col3) AGAINST ('OceanBase' IN NATURAL LANGUAGE MODE);

The return result is as follows:

+------+-----------+---------------------------------------------------------------------------------------------------------------------+
| col1 | col2      | col3                                                                                                                |
+------+-----------+---------------------------------------------------------------------------------------------------------------------+
|    2 | OceanBase | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team |
+------+-----------+---------------------------------------------------------------------------------------------------------------------+
1 row in set

Full-text search in Elasticsearch

OceanBase Database supports the full-text search syntax in Elasticsearch, including the MATCH() and SCORE() statements, starting from V4.4.1.

The MATCH() statement is used as a field after the WHERE clause to specify the filtering semantics. It must be used in conjunction with the SCORE() statement.

Notice

In the WHERE clause, only one MATCH() statement can be used, and it cannot be mixed with the MATCH() AGAINST() statement.
The SCORE() statement indicates the relevance of the corresponding MATCH() statement. The SCORE() statement is used in conjunction with the MATCH() statement and has no meaning on its own.

The syntax for full-text search in Elasticsearch is as follows:

SELECT [select_expr_list ,] SCORE()
FROM table_name
WHERE
    MATCH(
        A_COLUNM^x, B_COLUNM^y ...,
        'expr',
        '[parameters]'
        )
... ;

parameters:
    operator=or;
    minimum_should_match=int_value;
    boost=number_value;
    score_norm=min-max;
    type={most_fields | best_fields}

The following table describes the related parameters:

A_COLUNM^x, B_COLUNM^y ...: specifies the column names and weights. The weights must be positive numbers.

Notice

In the MATCH() statement, a full-text index must be created for each column, not for a combined index of multiple columns. For more information about how to create a full-text index, see Create an index.
expr: specifies the query statement. It is divided into several tokens, and each token can have a weight. The weight must be a positive number, and the default value is 1.
parameters: an optional parameter that specifies a list of key-value pairs. If no additional parameters are specified, you must retain a pair of empty quotes ''. The following parameters are available:
- operator: specifies the aggregation algorithm for the expr parameter. Valid values:
  - or: indicates that the output is generated if at least one token is matched. If the minimum_should_match parameter is specified, the returned results are determined by the value of the minimum_should_match parameter.
- minimum_should_match: specifies the minimum number of keywords that must be matched in the token group when the operator parameter is set to or. The value must be a positive integer.
- boost: specifies the weight at the MATCH level, which is multiplied by the relevance. The default value is 1.
- score_norm: specifies the normalization algorithm for a MATCH statement during scoring. Valid value: min-max. The default value is not to normalize.
- type: specifies the relevance calculation method for multiple columns in a MATCH statement. Valid values:
  - most_fields: indicates that the relevances of all columns are summed.
  - best_fields: indicates that the maximum relevance among all columns is taken.

Here are some examples:

Create a full-text index on the col2 column of the tbl1 table.

obclient> CREATE FULLTEXT INDEX ft_idx_tbl1_col2 ON tbl1(col2);

Create a full-text index on the col3 column of the tbl1 table.

obclient> CREATE FULLTEXT INDEX ft_idx_tbl1_col3 ON tbl1(col3);

Query the tbl1 table for rows that contain both "OceanBase" and "Database" (at least 2 keywords matched, minimum_should_match=2), and perform a full-text search on the col2 column (weight 2) and col3 column (weight 1). Return the matching scores (normalized to the range of 0-1).

obclient> SELECT *, SCORE() AS relevance_score
    FROM tbl1
    WHERE
        MATCH(col2^2, col3^1,
              'OceanBase Database',
              'operator=or;
              minimum_should_match=2;
              boost=1;
              score_norm=min-max;
              type=most_fields'
            );

The returned result is as follows:

+------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+
| col1 | col2      | col3                                                                                                                | relevance_score    |
+------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+
|    2 | OceanBase | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team | 0.6079027355623101 |
+------+-----------+---------------------------------------------------------------------------------------------------------------------+--------------------+
1 row in set

Query the tbl1 table for rows that contain either "OceanBase" or "Database" (at least 1 keyword matched, minimum_should_match=1), and perform a full-text search on the col2 column (weight 2) and col3 column (weight 1). Return the matching scores (normalized to the range of 0-1).

obclient> SELECT *, SCORE() AS relevance_score
    FROM tbl1
    WHERE
        MATCH(col2^2, col3^1,
              'OceanBase Database',
              'operator=or;
              minimum_should_match=1;
              boost=1;
              score_norm=min-max;
              type=most_fields'
            );

The returned result is as follows:

+------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+
| col1 | col2                | col3                                                                                                                | relevance_score    |
+------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+
|    2 | OceanBase           | OceanBase Database is a native, enterprise-level distributed database developed independently by the OceanBase team | 0.4208104883442219 |
|    3 | Database Management | Learn about SQL and database administration                                                                         | 0.2876526910145043 |
+------+---------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+
2 rows in set

OceanBase

Customer Stories

Documentation

Full-text search

Syntax

Note

Note

Vectorized query

Notice

Examples

Full-text search in Elasticsearch

Notice

Notice

References