Purpose
This expression is used to perform text retrieval on full-text indexes. The AGAINST clause takes a search string as its parameter, which is compared with the index in the specified character set. The MATCH function returns a value that represents the relevance between the search string and the data in each row of the table. The relevance value indicates the similarity between the text in the search string and the text in the data table.
Syntax
MATCH (column_set) AGAINST (query_expr [mode_flag])
column_set:
column_name [, column_name ...]
mode_flag:
IN NATURAL LANGUAGE MODE
| IN BOOLEAN MODE
Parameters
| Parameter | Description |
|---|---|
| column_set | Specifies the columns to be searched. If you want to specify multiple columns, separate them with commas. The order of the columns does not matter. A full-text index that exactly matches the column_set must exist for the corresponding MATCH AGAINST expression to be executed. |
| query_expr | Specifies the keywords or phrases to be searched, which is the parameter to be matched. OceanBase Database will tokenize the query_expr based on the tokenizer used for the full-text index and then perform a search on the full-text index. |
| mode_flag | Optional. Specifies the full-text search mode. The default value is IN NATURAL LANGUAGE MODE. For more information, see mode_flag. |
mode_flag
IN NATURAL LANGUAGE MODE: The default value, which specifies the use of natural language search mode. In this mode, the system tokenizes the query expression (query_expr) into tokens (token) and matches these tokens with those in the index. A match is considered successful if any token in the query expression matches a token in the index. The system also ranks the matched rows based on their relevance to the query expression using the Okapi BM25 algorithm.If the
IN NATURAL LANGUAGE MODEkeyword is not explicitly specified,MATCH AGAINSTdefaults to using theNATURAL LANGUAGEmode for full-text search. In this mode, theAGAINSTclause accepts a search string as its parameter and compares it with the index according to the specified character set. For each row in the table, theMATCHfunction returns a relevance score that indicates the similarity between the search string and the text in the row.IN BOOLEAN MODE: specifies the use of Boolean search mode. The current version supports three commonly used Boolean operators and nested Boolean expressions. The operators are described as follows:+: indicatesAND, representing the intersection of sets.-: indicatesNOT, representing the difference between sets.No operator: When no operator is specified, it implies
OR, representing the union of sets. For example,A Bis equivalent toA OR B. Using the+operator can increase the relevance of queries containing both terms, but theORsemantics are lost. For example, in the query+A B, the result must includeA, and the relevance betweenAandBis calculated.(): represents nested Boolean expressions. If no operator precedes the parentheses, theORoperator is implied. For example,+A (nested clause)means that the sentence must contain A or the nested clause.Note
For OceanBase Database V4.3.5, the
IN BOOLEAN MODEclause was introduced starting from V4.3.5 BP1.Here is an example:
The output sentence must contain the word
computer.SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE);The output sentence must contain the word
computerand must not contain the wordweather.SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE);The output sentence must contain the word
computer, withoceanbasebeing more relevant.SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);
Considerations
The MATCH AGAINST expression can also be used for projection, filtering, and sorting in different clauses of an SQL statement:
Projection:
- Full-text search supports ranking based on token frequency. The
MATCH AGAINSTexpression can be used to project the relevance. - The relevance value indicates the relevance between the matched rows and the query (Query) in the
MATCH AGAINSTexpression. - The relevance value is a
DOUBLEvalue that is greater than or equal to 0. A relevance value of 0 indicates that the matched rows are irrelevant to the query (Query). The higher the relevance value, the more relevant the matched rows are to the query (Query).
- Full-text search supports ranking based on token frequency. The
Filtering:
If the
MATCH AGAINSTexpression in theWHEREclause is connected to other filter conditions with theANDoperator, theMATCH AGAINSTexpression has a filtering semantics, which means that the rows that do not match theMATCH AGAINSTexpression will be filtered out.Here is an example:
SELECT id, digest, detail FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase');This statement is equivalent to:
SELECT id, digest, detail FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase') > 0;
Sorting:
- If the
MATCH AGAINSTexpression in theWHEREclause is connected to other filter conditions with theANDoperator, theMATCH AGAINSTexpression has a sorting semantics. - The system sorts the full-text search results in descending order of relevance based on the
MATCH AGAINSTexpression. - If there are multiple
MATCH AGAINSTexpressions, the rows are sorted in descending order of relevance based on the firstMATCH AGAINSTexpression.
Here is an example:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM t1 WHERE MATCH (detail) against ('oceanbase');This statement is equivalent to:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase') ORDER BY relevance DESC;- If the
Execution methods for full-text search
If the
MATCH AGAINSTexpression in the SQL statement has a filtering semantics, OceanBase Database will scan the full-text index to calculate theMATCH AGAINSTexpression and then return to the table to execute the query.If the
MATCH AGAINSTexpression in the SQL statement does not have a filtering semantics, OceanBase Database will scan and filter the table based on other secondary indexes and calculate theMATCH AGAINSTexpression based on the primary key to execute the query.If the
WHEREclause of the SQL statement contains theMATCH AGAINSTexpression and multiple filter conditions connected with theANDoperator, and other secondary indexes can be hit, OceanBase Database will try to select an index to scan based on the cost.When the
MATCH AGAINSTexpression is calculated based on the full-text index scan, if the query contains aLIMITclause, OceanBase Database will try to push the top-k calculation to the full-text index scan to improve the performance.When querying with a
MATCH AGAINSTpredicate that contains anORconnection, the system will automatically attempt to generate an index merge plan. Whether the index merge plan is ultimately selected depends on the result of cost-based optimization.Note
For OceanBase Database V4.3.5, the generation of index merge plans is supported starting from V4.3.5 BP1.
Examples
Create a sample table
test_tbl1, and create two full-text indexesft_idx1_test_tbl1andft_idx1_test_tbl1, and an indexidx_test_tbl1.CREATE TABLE test_tbl1(id INT, ref_no INT, digest VARCHAR(512), detail VARCHAR(4096), FULLTEXT INDEX ft_idx1_test_tbl1(detail), FULLTEXT INDEX ft_idx2_test_tbl1(digest, detail), INDEX idx_test_tbl1 (id));Insert test data into the
test_tbl1table.INSERT INTO test_tbl1 VALUES (1, 1234, 'fulltext', 'Try text retrieval with OceanBase fulltext index'), (2, 2345, 'log', 'OceanBase can halp with log analysis'), (3, 3456, 'order', 'Simple text retrieval scan will return result set in order of descending ranking in OceanBase'), (4, 4567, 'ranking', 'OceanBase will ranking relevance to query for matched result set'), (5, 5678, 'filter', 'Using text retrieval as a filter condition would be probably more efficient');Use the
MATCH AGAINSTexpression to project the relevance.SELECT id, digest, detail, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1;The return result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+--------------------+ | id | digest | detail | relevance | +------+----------+-----------------------------------------------------------------------------------------------+--------------------+ | 1 | fulltext | Try text retrieval with OceanBase fulltext index | 0.2989130434782609 | | 2 | log | OceanBase can halp with log analysis | 0.3142857142857143 | | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | 0.240174672489083 | | 4 | ranking | OceanBase will ranking relevance to query for matched result set | 0.2849740932642488 | | 5 | filter | Using text retrieval as a filter condition would be probably more efficient | 0 | +------+----------+-----------------------------------------------------------------------------------------------+--------------------+ 5 rows in setUse the
MATCH AGAINSTexpression to filter rows.SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase');The return result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+ | id | digest | detail | +------+----------+-----------------------------------------------------------------------------------------------+ | 2 | log | OceanBase can halp with log analysis | | 1 | fulltext | Try text retrieval with OceanBase fulltext index | | 4 | ranking | OceanBase will ranking relevance to query for matched result set | | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | +------+----------+-----------------------------------------------------------------------------------------------+ 4 rows in setUse the
MATCH AGAINSTexpression to retrieve and sort rows.SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase');or
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') ORDER BY relevance DESC;The return result is as follows:
+------+----------+--------------------+ | id | digest | relevance | +------+----------+--------------------+ | 2 | log | 0.3142857142857143 | | 1 | fulltext | 0.2989130434782609 | | 4 | ranking | 0.2849740932642488 | | 3 | order | 0.240174672489083 | +------+----------+--------------------+ 4 rows in set
Execution plans for full-text search
The
MATCH AGAINSTexpression has a filtering semantics, and the system will scan the full-text indexft_idx1_test_tbl1.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') AND id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ============================================================================= | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ----------------------------------------------------------------------------- | | |0 |SORT | |1 |30 | | | |1 |└─TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|1 |30 | | | ============================================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]) | | 1 - output([test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest], [MATCH(test_tbl1.detail) AGAINST('oceanbase')]), filter([test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=true, is_global_index=false, filter_before_indexback[false], | | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 15 rows in setThe
MATCH AGAINSTexpression has a filtering semantics, but the ordinary secondary indexidx_test_tbl1is hit, and the cost of theidx_test_tbl1index scan is lower. Therefore, the system will scan theidx_test_tbl1index.EXPLAIN SELECT id FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase simple text retreival scan on fulltext index') AND id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ====================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------------------------- | | |0 |SORT | |1 |103 | | | |1 |└─TABLE RANGE SCAN|test_tbl1(idx_test_tbl1)|1 |103 | | | ====================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id]), filter(nil), rowset=16 | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index'), DESC]) | | 1 - output([test_tbl1.id], [MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index')]), filter([MATCH(test_tbl1.detail) | | AGAINST('oceanbase simple text retreival scan on fulltext index')]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(3,MIN ; 3,MAX), | | range_cond([test_tbl1.id = 3]), has_functional_lookup=true | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 16 rows in setThe
MATCH AGAINSTexpression does not have a filtering semantics, and therefore cannot filter rows. The system will scan the primary table.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') OR id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ==================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------- | | |0 |TABLE FULL SCAN|test_tbl1|5 |503 | | | ==================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.__pk_increment]), range(MIN ; MAX)always true, has_functional_lookup=true | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 11 rows in setThe
MATCH AGAINSTexpression does not have a filtering semantics, and therefore cannot filter rows. The system will scan the primary table. The ordinary secondary indexidx_test_tbl1is hit. Therefore, the system will scan theidx_test_tbl1index.EXPLAIN SELECT id FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') OR id = 3;The return result is as follows:
+------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------------------+ | =================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------------------- | | |0 |TABLE FULL SCAN|test_tbl1(idx_test_tbl1)|5 |503 | | | =================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(MIN,MIN ; MAX,MAX)always true, has_functional_lookup=true | +------------------------------------------------------------------------------------------------------------------------------+ 11 rows in setThe
MATCH AGAINSTexpression has a filtering semantics, and the system will scan the corresponding full-text indexft_idx1_test_tbl1. TheLIMITclause is pushed down to the index scan.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') LIMIT 3;The return result is as follows:
+------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------+ | =========================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | --------------------------------------------------------------------------- | | |0 |TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|4 |31 | | | =========================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=true, is_global_index=false, | | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]), limit(3), offset(nil), with_ties(false) | +------------------------------------------------------------------------------------------------------------------+ 13 rows in set