Purpose
This expression is used for text retrieval on a full-text index. The AGAINST clause accepts a search string and searches the index based on the collation. For each row in the table, the return value of MATCH represents the relevance between the search string and the data in the row, which indicates the similarity between the text in the search string and the text in the data table.
Syntax
MATCH (column_set) AGAINST (query_expr [mode_flag])
column_set:
column_name [, column_name ...]
mode_flag:
IN NATURAL LANGUAGE MODE
| IN BOOLEAN MODE
Parameters
Field |
Description |
|---|---|
| column_set | Specifies the column to be searched. If multiple columns are to be listed, they must be separated by commas. The query is independent of the order of the columns. To execute the MATCH AGAINST expression, there must be a full-text index that exactly matches the column_set. |
| query_expr | Specifies the keyword or phrase to be searched, which is the parameter to be matched. OceanBase Database will tokenize query_expr using the tokenizer of the full-text index and then search the full-text index. |
| mode_flag | Optional. Specifies the search mode for full-text search. The default value is IN NATURAL LANGUAGE MODE. For more information, see mode_flag. |
mode_flag
IN NATURAL LANGUAGE MODE: The default value. It specifies to use the natural language search mode. In this mode, the full-text search is performed by tokenizing the query expression (query_expr) to obtain a set of tokens (token), and then matching the tokens with those in the index. A match is considered successful if at least one token matches. Additionally, the relevance of the matched rows is ranked using the Okapi BM25 method.If the
IN NATURAL LANGUAGE MODEkeyword is not explicitly specified,MATCH AGAINSTdefaults to using theNATURAL LANGUAGEmode for full-text search. In this mode, theAGAINSTclause accepts a search string as its parameter and compares it with the index according to the specified character set. For each row in the table, theMATCHfunction returns a relevance score that indicates the similarity between the search string and the text in the row.IN BOOLEAN MODE: Specifies to use the boolean mode for search. The current version supports three commonly used boolean operators and nested operations, as follows:+: indicatesAND, representing the intersection of sets.-: indicatesNOT, representing the difference between sets.No operator: When no operator is specified, it implies
OR, representing the union of sets. For example,A Bis equivalent toA OR B. Using the+operator can increase the relevance of queries containing both terms, but theORsemantics are lost. For example, in the query+A B, the result must includeA, and the relevance betweenAandBis calculated.(): represents nested Boolean expressions. If no operator precedes the parentheses, theORoperator is implied. For example,+A (nested clause)means that the sentence must contain A or the nested clause.Note
For OceanBase Database V4.3.5, support for
IN BOOLEAN MODEwas added starting from V4.3.5 BP1.Examples:
The output sentence must contain the word "computer".
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE);The output sentence must contain the word "computer" and must not contain the word "weather".
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE);The output sentence must contain the word "computer", and "oceanbase" is more relevant.
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);
Considerations
The MATCH AGAINST expression not only represents text matching but also has other semantics in different clauses of SQL:
Projection of relevance:
- Full-text search supports ranking based on term frequency, which can be represented by the
MATCH AGAINSTexpression. - Relevance indicates the degree of match between the matched data rows and the query (Query) specified in
MATCH AGAINST. - Relevance is a
DOUBLEvalue greater than or equal to 0. A value of 0 indicates no relevance, while a higher value indicates a stronger match.
- Full-text search supports ranking based on term frequency, which can be represented by the
Filtering semantics:
If the
MATCH AGAINSTexpression appears in theWHEREclause and is connected to all other filtering conditions usingAND, it has filtering semantics. This means that data rows that do not match the query will be filtered out.Example:
SELECT id, digest, detail FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase');Equivalent to:
SELECT id, digest, detail FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase') > 0;
Sorting semantics:
- If the
MATCH AGAINSTexpression appears in theWHEREclause and is connected to other filtering conditions usingAND, it has sorting semantics. - This means that the results of the full-text search will be sorted in descending order based on the relevance ranking of the
MATCH AGAINSTexpression. - When multiple
MATCH AGAINSTexpressions are present, the sorting is based on the relevance of the firstMATCH AGAINSTexpression.
Example:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM t1 WHERE MATCH (detail) against ('oceanbase');Equivalent to:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase') ORDER BY relevance DESC;- If the
Execution of text retrieval
When the
MATCH AGAINSTexpression in an SQL statement has filtering semantics, OceanBase Database can execute the query by scanning the full-text index, computing theMATCH AGAINSTexpression, and then accessing the table.When the
MATCH AGAINSTexpression in an SQL statement does not have filtering semantics, OceanBase Database can execute the query by scanning and filtering using other secondary indexes and then accessing the full-text index based on the primary key.When the
WHEREclause of an SQL statement containsMATCH AGAINSTand multiple filtering conditions connected byAND, and other secondary indexes can be hit, OceanBase Database will try to select the most cost-effective index for scanning.When computing the
MATCH AGAINSTexpression by scanning the full-text index, if the query containsLIMIT, OceanBase Database will try to push the top-k calculation down to the full-text index scan to improve performance.When a query contains
MATCH AGAINSTpredicates connected byOR, an index merge plan will be automatically generated. The final decision to use the index merge plan depends on the cost competition.Note
For OceanBase Database V4.3.5, support for generating an index merge plan was added starting from V4.3.5 BP1.
Examples
Create a sample table
test_tbl1, and create two full-text indexesft_idx1_test_tbl1andft_idx2_test_tbl1, and an indexidx_test_tbl1.CREATE TABLE test_tbl1(id INT, ref_no INT, digest VARCHAR(512), detail VARCHAR(4096), FULLTEXT INDEX ft_idx1_test_tbl1(detail), FULLTEXT INDEX ft_idx2_test_tbl1(digest, detail), INDEX idx_test_tbl1 (id));Insert test data into the
test_tbl1table.INSERT INTO test_tbl1 VALUES (1, 1234, 'fulltext', 'Try text retrieval with OceanBase fulltext index'), (2, 2345, 'log', 'OceanBase can halp with log analysis'), (3, 3456, 'order', 'Simple text retrieval scan will return result set in order of descending ranking in OceanBase'), (4, 4567, 'ranking', 'OceanBase will ranking relevance to query for matched result set'), (5, 5678, 'filter', 'Using text retrieval as a filter condition would be probably more efficient');Use the
MATCH AGAINSTexpression to project relevance.SELECT id, digest, detail, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1;The return result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+--------------------+ | id | digest | detail | relevance | +------+----------+-----------------------------------------------------------------------------------------------+--------------------+ | 1 | fulltext | Try text retrieval with OceanBase fulltext index | 0.2989130434782609 | | 2 | log | OceanBase can halp with log analysis | 0.3142857142857143 | | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | 0.240174672489083 | | 4 | ranking | OceanBase will ranking relevance to query for matched result set | 0.2849740932642488 | | 5 | filter | Using text retrieval as a filter condition would be probably more efficient | 0 | +------+----------+-----------------------------------------------------------------------------------------------+--------------------+ 5 rows in setFilter the data using the
MATCH AGAINSTexpression.SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase');The return result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+ | id | digest | detail | +------+----------+-----------------------------------------------------------------------------------------------+ | 2 | log | OceanBase can halp with log analysis | | 1 | fulltext | Try text retrieval with OceanBase fulltext index | | 4 | ranking | OceanBase will ranking relevance to query for matched result set | | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | +------+----------+-----------------------------------------------------------------------------------------------+ 4 rows in setRetrieve and sort the data using the
MATCH AGAINSTexpression.SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase');or
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') ORDER BY relevance DESC;The return result is as follows:
+------+----------+--------------------+ | id | digest | relevance | +------+----------+--------------------+ | 2 | log | 0.3142857142857143 | | 1 | fulltext | 0.2989130434782609 | | 4 | ranking | 0.2849740932642488 | | 3 | order | 0.240174672489083 | +------+----------+--------------------+ 4 rows in set
Example of a text search execution plan
The
MATCH AGAINSTexpression contains filtering semantics and scans the full-text indexft_idx1_test_tbl1.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') AND id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ============================================================================= | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ----------------------------------------------------------------------------- | | |0 |SORT | |1 |30 | | | |1 |└─TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|1 |30 | | | ============================================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]) | | 1 - output([test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest], [MATCH(test_tbl1.detail) AGAINST('oceanbase')]), filter([test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=true, is_global_index=false, filter_before_indexback[false], | | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 15 rows in setThe
MATCH AGAINSTexpression contains filtering semantics but hits the general secondary indexidx_test_tbl1, which has a lower scan cost. Therefore, the indexidx_test_tbl1is scanned.EXPLAIN SELECT id FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase simple text retreival scan on fulltext index') AND id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ====================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------------------------- | | |0 |SORT | |1 |103 | | | |1 |└─TABLE RANGE SCAN|test_tbl1(idx_test_tbl1)|1 |103 | | | ====================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id]), filter(nil), rowset=16 | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index'), DESC]) | | 1 - output([test_tbl1.id], [MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index')]), filter([MATCH(test_tbl1.detail) | | AGAINST('oceanbase simple text retreival scan on fulltext index')]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(3,MIN ; 3,MAX), | | range_cond([test_tbl1.id = 3]), has_functional_lookup=true | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 16 rows in setThe
MATCH AGAINSTexpression does not contain filtering semantics, and it cannot filter the result set. Therefore, the full-text index cannot be scanned, and the main table is scanned.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') OR id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ==================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------- | | |0 |TABLE FULL SCAN|test_tbl1|5 |503 | | | ==================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.__pk_increment]), range(MIN ; MAX)always true, has_functional_lookup=true | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 11 rows in setThe
MATCH AGAINSTexpression does not contain filtering semantics, and it cannot filter the result set. Therefore, the full-text index cannot be scanned. However, the indexidx_test_tbl1is hit, and the indexidx_test_tbl1is scanned.EXPLAIN SELECT id FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') OR id = 3;The return result is as follows:
+------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------------------+ | =================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------------------- | | |0 |TABLE FULL SCAN|test_tbl1(idx_test_tbl1)|5 |503 | | | =================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(MIN,MIN ; MAX,MAX)always true, has_functional_lookup=true | +------------------------------------------------------------------------------------------------------------------------------+ 11 rows in setThe
MATCH AGAINSTexpression contains filtering semantics and scans the corresponding full-text indexft_idx1_test_tbl1. At the same time, theLIMITclause is pushed down to the index scan for top-k results.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') LIMIT 3;The return result is as follows:
+------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------+ | =========================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | --------------------------------------------------------------------------- | | |0 |TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|4 |31 | | | =========================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=true, is_global_index=false, | | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]), limit(3), offset(nil), with_ties(false) | +------------------------------------------------------------------------------------------------------------------+ 13 rows in set
