Purpose
This expression is used for text retrieval on a full-text index. The AGAINST clause accepts a search string and searches the index based on the collation. For each row in the table, the return value of MATCH represents the relevance between the search string and the data in the row, i.e., the similarity between the text in the search string and the text in the data table.
Syntax
MATCH (column_set) AGAINST (query_expr [mode_flag])
column_set:
column_name [, column_name ...]
mode_flag:
IN NATURAL LANGUAGE MODE
| IN BOOLEAN MODE
| IN MATCH PHRASE MODE
Parameters
| Field | Description |
|---|---|
| column_set | Specifies the column to be searched. If multiple columns are specified, they must be separated by commas. The query is independent of the order of the columns. To execute the MATCH AGAINST expression, there must be a full-text index that exactly matches the column_set. |
| query_expr | Specifies the keyword or phrase to be searched, which is the parameter to be matched. OceanBase Database will tokenize query_expr using the tokenizer of the full-text index and then perform a search on the full-text index. |
| mode_flag | Optional. Specifies the search mode. The default value is IN NATURAL LANGUAGE MODE. For more information, see mode_flag. |
mode_flag
IN NATURAL LANGUAGE MODE: The default value. Specifies to use the natural language search mode. In this mode, the full-text search is performed by tokenizing the query expression (query_expr) to obtain a set of tokens (token) and matching these tokens with those in the index. A match is considered successful if at least one token matches. The relevance of the matched rows is ranked using the Okapi BM25 method.By default, or if
IN NATURAL LANGUAGE MODEis specified,MATCH AGAINSTwill use theNATURAL LANGUAGEmode for full-text search. In this mode,AGAINSTaccepts a search string and searches the index based on the collation. For each row in the table, the return value ofMATCHrepresents the relevance between the search string and the data in the row, i.e., the similarity between the text in the search string and the text in the data table.IN BOOLEAN MODE: Specifies to use the Boolean search mode. This mode supports three commonly used Boolean operators and nested operations, as follows:+: RepresentsAND, which is the intersection.-: Represents the negation, which is the difference.No operator: When used alone, it represents
OR, which is the union. For example,A BmeansA OR B. Using the operator with the symbol will increase the relevance of the sentence but will lose theORsemantics. For example,+A Bmeans that A must be present, and the relevance between A and B in the sentence is calculated.(): Represents nested operations. If there is no symbol outside the parentheses, it has theORsemantics. For example,+A (nested clause)means that A must be present or the nested clause must be present.Note
For OceanBase Database V4.3.5, support for
IN BOOLEAN MODEwas introduced starting from V4.3.5 BP1.Here are some examples:
Output sentences must contain the word "computer".
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE);Output sentences must contain the word "computer" and must not contain the word "weather".
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE);Output sentences must contain the word "computer", and those containing "oceanbase" are more relevant.
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);
IN MATCH PHRASE MODE: Supports exact phrase matching.Note
For OceanBase Database V4.4.0 and later, support for
IN MATCH PHRASE MODEis available.Here are some examples:
Create a table named test and add a full-text index to the specified column.
CREATE TABLE test(c1 int, c2 VARCHAR(200), c3 TEXT, FULLTEXT INDEX fts_idx (c2, c3));Query records in the c2 and c3 columns that exactly match
some words.SELECT c2 FROM test WHERE MATCH(c2,c3) AGAINST('some words' IN MATCH PHRASE MODE);
Considerations
The MATCH AGAINST expression, in addition to indicating text matching, contains other semantics when it appears in different clauses of SQL:
Projection of relevance:
- Full-text search supports ranking based on term frequency, which can be represented by the projection of relevance through the
MATCH AGAINSTexpression. - Relevance indicates the degree of match between the matched data rows and the query (Query) in
MATCH AGAINST. - Relevance is a
DOUBLEtype data greater than or equal to 0. A value of 0 indicates that the data row is unrelated to the query (Query), and a higher value indicates a stronger relevance.
- Full-text search supports ranking based on term frequency, which can be represented by the projection of relevance through the
Filtering semantics:
When the
MATCH AGAINSTexpression appears in theWHEREclause and is connected to all other filtering conditions usingAND, it contains filtering semantics, indicating that it will filter out data rows that do not match.Example:
SELECT id, digest, detail FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase');Equivalent to:
SELECT id, digest, detail FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase') > 0;
Sorting semantics:
- When the
MATCH AGAINSTexpression appears in theWHEREclause and is connected to other filtering conditions usingAND, it contains sorting semantics. - It indicates that the results of the full-text search will be sorted in descending order based on the relevance ranking of the
MATCH AGAINSTexpression. - When there are multiple
MATCH AGAINSTexpressions, the sorting will be based on the relevance of the firstMATCH AGAINSTexpression.
Example:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM t1 WHERE MATCH (detail) against ('oceanbase');Equivalent to:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase') ORDER BY relevance DESC;- When the
Execution method of text retrieval
When the
MATCH AGAINSTexpression contains filtering semantics in the SQL statement, OceanBase Database can execute the query by scanning the full-text index to calculate theMATCH AGAINSTexpression and then accessing the table.When the
MATCH AGAINSTexpression does not contain filtering semantics in the SQL statement, OceanBase Database supports scanning and filtering using other secondary indexes and accessing the full-text index based on the primary key to calculate theMATCH AGAINSTexpression.When the
WHEREclause of the SQL statement containsMATCH AGAINSTand multiple filtering conditions connected byAND, and other secondary indexes can be hit, OceanBase Database will try to select the most cost-effective index for scanning.When calculating the
MATCH AGAINSTexpression by scanning the full-text index, if the query containsLIMIT, OceanBase Database will try to push the top-k calculation down to the full-text index scan to improve performance.When the query contains
MATCH AGAINSTpredicates connected byOR, an Index Merge plan will be automatically generated. Whether to choose the Index Merge plan depends on the cost competition result.Note
- Starting from V4.4.1, the Index Merge plan supports multiple predicates connected by
AND. When the query contains multiple predicates, includingMATCH AGAINST, an Index Merge plan will be automatically generated. - In versions earlier than V4.4.1, the Index Merge plan only supports multiple predicates connected by
OR.
- Starting from V4.4.1, the Index Merge plan supports multiple predicates connected by
Examples
Create the sample table
test_tbl1, and create the full-text indexesft_idx1_test_tbl1andft_idx1_test_tbl1, and the indexidx_test_tbl1.CREATE TABLE test_tbl1(id INT, ref_no INT, digest VARCHAR(512), detail VARCHAR(4096), FULLTEXT INDEX ft_idx1_test_tbl1(detail), FULLTEXT INDEX ft_idx2_test_tbl1(digest, detail), INDEX idx_test_tbl1 (id));Insert test data into the
test_tbl1table.INSERT INTO test_tbl1 VALUES (1, 1234, 'fulltext', 'Try text retrieval with OceanBase fulltext index'), (2, 2345, 'log', 'OceanBase can halp with log analysis'), (3, 3456, 'order', 'Simple text retrieval scan will return result set in order of descending ranking in OceanBase'), (4, 4567, 'ranking', 'OceanBase will ranking relevance to query for matched result set'), (5, 5678, 'filter', 'Using text retrieval as a filter condition would be probably more efficient');Project relevance using the
MATCH AGAINSTexpression.SELECT id, digest, detail, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1;The returned result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+--------------------+ | id | digest | detail | relevance | +------+----------+-----------------------------------------------------------------------------------------------+--------------------+ | 1 | fulltext | Try text retrieval with OceanBase fulltext index | 0.2989130434782609 | | 2 | log | OceanBase can halp with log analysis | 0.3142857142857143 | | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | 0.240174672489083 | | 4 | ranking | OceanBase will ranking relevance to query for matched result set | 0.2849740932642488 | | 5 | filter | Using text retrieval as a filter condition would be probably more efficient | 0 | +------+----------+-----------------------------------------------------------------------------------------------+--------------------+ 5 rows in setFilter using the
MATCH AGAINSTexpression.SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase');The returned result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+ | id | digest | detail | +------+----------+-----------------------------------------------------------------------------------------------+ | 2 | log | OceanBase can halp with log analysis | | 1 | fulltext | Try text retrieval with OceanBase fulltext index | | 4 | ranking | OceanBase will ranking relevance to query for matched result set | | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | +------+----------+-----------------------------------------------------------------------------------------------+ 4 rows in setRetrieve and sort using the
MATCH AGAINSTexpression.SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase');or
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') ORDER BY relevance DESC;The returned result is as follows:
+------+----------+--------------------+ | id | digest | relevance | +------+----------+--------------------+ | 2 | log | 0.3142857142857143 | | 1 | fulltext | 0.2989130434782609 | | 4 | ranking | 0.2849740932642488 | | 3 | order | 0.240174672489083 | +------+----------+--------------------+ 4 rows in set
Example of text retrieval execution plan
The
MATCH AGAINSTexpression contains filtering semantics and scans the full-text indexft_idx1_test_tbl1.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') AND id = 3;The returned result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ============================================================================= | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ----------------------------------------------------------------------------- | | |0 |SORT | |1 |30 | | | |1 |└─TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|1 |30 | | | ============================================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]) | | 1 - output([test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest], [MATCH(test_tbl1.detail) AGAINST('oceanbase')]), filter([test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=true, is_global_index=false, filter_before_indexback[false], | | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 15 rows in setThe
MATCH AGAINSTexpression contains filtering semantics, but the ordinary secondary indexidx_test_tbl1is hit, and the cost of scanning theidx_test_tbl1index is lower. Therefore, theidx_test_tbl1index is scanned.EXPLAIN SELECT id FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase simple text retreival scan on fulltext index') AND id = 3;The returned result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ====================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------------------------- | | |0 |SORT | |1 |103 | | | |1 |└─TABLE RANGE SCAN|test_tbl1(idx_test_tbl1)|1 |103 | | | ====================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id]), filter(nil), rowset=16 | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index'), DESC]) | | 1 - output([test_tbl1.id], [MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index')]), filter([MATCH(test_tbl1.detail) | | AGAINST('oceanbase simple text retreival scan on fulltext index')]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(3,MIN ; 3,MAX), | | range_cond([test_tbl1.id = 3]), has_functional_lookup=true | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 16 rows in setThe
MATCH AGAINSTexpression does not contain filtering semantics, cannot filter the result set, and cannot scan the full-text index. Therefore, the main table is scanned.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') OR id = 3;The returned result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ==================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------- | | |0 |TABLE FULL SCAN|test_tbl1|5 |503 | | | ==================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.__pk_increment]), range(MIN ; MAX)always true, has_functional_lookup=true | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 11 rows in setThe
MATCH AGAINSTexpression does not contain filtering semantics, cannot filter the result set, and cannot scan the full-text index. However, theidx_test_tbl1index is hit. Therefore, theidx_test_tbl1index is scanned.EXPLAIN SELECT id FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') OR id = 3;The returned result is as follows:
+------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------------------+ | =================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------------------- | | |0 |TABLE FULL SCAN|test_tbl1(idx_test_tbl1)|5 |503 | | | =================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(MIN,MIN ; MAX,MAX)always true, has_functional_lookup=true | +------------------------------------------------------------------------------------------------------------------------------+ 11 rows in setThe
MATCH AGAINSTexpression contains filtering semantics and scans the corresponding full-text indexft_idx1_test_tbl1. Additionally, theLIMITclause is pushed down to the full-text index scan as top-k.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') LIMIT 3;The returned result is as follows:
+------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------+ | =========================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | --------------------------------------------------------------------------- | | |0 |TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|4 |31 | | | =========================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=true, is_global_index=false, | | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]), limit(3), offset(nil), with_ties(false) | +------------------------------------------------------------------------------------------------------------------+ 13 rows in set