Purpose
This expression is used for text retrieval on a full-text index. The AGAINST clause accepts a search string and searches for it in the index based on the character set comparison. For each row in the table, the return value of MATCH represents the relevance between the search string and the data in the row, which is the similarity between the text in the search string and the text in the data table.
Syntax
MATCH (column_set) AGAINST (query_expr [mode_flag])
column_set:
column_name [, column_name ...]
mode_flag:
IN NATURAL LANGUAGE MODE
| IN BOOLEAN MODE
| IN MATCH PHRASE MODE
Parameters
| Field | Description |
|---|---|
| column_set | Specifies the column to be searched. If multiple columns are specified, they must be separated by commas. The query is independent of the order of the columns. To execute the MATCH AGAINST expression, there must be a full-text index that exactly matches the column_set. |
| query_expr | Specifies the keyword or phrase to be searched for, which is the parameter to be matched. OceanBase Database will tokenize query_expr based on the tokenizer used by the full-text index and then perform a search on the full-text index. |
| mode_flag | Optional. Specifies the search mode. The default value is IN NATURAL LANGUAGE MODE. For more information, see mode_flag. |
mode_flag
IN NATURAL LANGUAGE MODE: The default mode, used to specify the use of the natural language search mode. In this mode, the full-text search is performed by tokenizing the query expression (query_expr) to obtain a set of tokens (token), and then matching these tokens with those in the index. A match is considered successful if at least one token matches. Additionally, the relevance of the matched rows is ranked using the Okapi BM25 method.By default, or when
IN NATURAL LANGUAGE MODEis specified,MATCH AGAINSTuses theNATURAL LANGUAGEmode for full-text search. In this mode,AGAINSTaccepts a search string and searches for it in the index based on the character set comparison. For each row in the table, the return value ofMATCHrepresents the relevance between the search string and the data in the row, which is the similarity between the text in the search string and the text in the data table.IN BOOLEAN MODE: Used to specify the use of the Boolean mode for search. The current version supports three most commonly used Boolean operators and nested operations, as follows:+: RepresentsAND, which is the intersection.-: Represents negation, which is the difference.No operator: When used alone, it represents
OR, which is the union. For example,A BmeansA OR B. Using operators withORcan increase the relevance of the sentences but will lose theORmeaning. For example,+A Bmeans A must be present, and the relevance of A and B in the sentence is calculated.(): Represents nested operations. When there is no operator outside, it has theORmeaning. For example,+A (nested clause)means A or the nested clause must be present.Note
For OceanBase Database V4.3.5, support for
IN BOOLEAN MODEwas introduced starting from V4.3.5 BP1.Examples:
Output sentences must contain the word "computer".
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer" IN BOOLEAN MODE);Output sentences must contain the word "computer" and must not contain the word "weather".
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer -weather" IN BOOLEAN MODE);Output sentences must contain the word "computer", and those containing "oceanbase" are more relevant.
SELECT * FROM my_table WHERE MATCH (doc) AGAINST ("+computer oceanbase" IN BOOLEAN MODE);
IN MATCH PHRASE MODE: Used to support exact phrase matching.Note
Support for
IN MATCH PHRASE MODEwas introduced in OceanBase Database V4.4.0.Examples:
Create a table named test and add a full-text index to the specified columns.
CREATE TABLE test(c1 int, c2 VARCHAR(200), c3 TEXT, FULLTEXT INDEX fts_idx (c2, c3));Query records where the c2 and c3 columns exactly match
some words.SELECT c2 FROM test WHERE MATCH(c2,c3) AGAINST('some words' IN MATCH PHRASE MODE);
Considerations
The MATCH AGAINST expression, in addition to indicating text matching, has other semantic meanings in different clauses of SQL:
Projection of relevance:
- Full-text search supports ranking based on word frequency. The
MATCH AGAINSTexpression can be used to project the relevance. - Relevance indicates the degree of match between the matched data rows and the query (Query) in the
MATCH AGAINSTexpression. - Relevance is a
DOUBLEtype data greater than or equal to 0. A value of 0 indicates that the data row is unrelated to the query (Query). The higher the value, the stronger the relevance.
- Full-text search supports ranking based on word frequency. The
Filtering semantics:
If the
MATCH AGAINSTexpression appears in theWHEREclause and is connected to all other filtering conditions usingAND, it has filtering semantics. This means that data rows that do not match the query will be filtered out.Here is an example:
SELECT id, digest, detail FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase');This is equivalent to:
SELECT id, digest, detail FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase') > 0;
Sorting semantics:
- If the
MATCH AGAINSTexpression appears in theWHEREclause and is connected to other filtering conditions usingAND, it has sorting semantics. - This means that the results of the full-text search will be sorted in descending order based on the relevance ranking of the
MATCH AGAINSTexpression. - If there are multiple
MATCH AGAINSTexpressions, the results will be sorted based on the relevance of the firstMATCH AGAINSTexpression.
Here is an example:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM t1 WHERE MATCH (detail) against ('oceanbase');This is equivalent to:
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM t1 WHERE MATCH (detail) AGAINST ('oceanbase') ORDER BY relevance DESC;- If the
Execution of text retrieval
If the
MATCH AGAINSTexpression in the SQL statement has filtering semantics, OceanBase Database can execute the query by scanning the full-text index to calculate theMATCH AGAINSTexpression and then accessing the table.If the
MATCH AGAINSTexpression in the SQL statement does not have filtering semantics, OceanBase Database can execute the query by scanning and filtering using other secondary indexes and then accessing the full-text index based on the primary key.If the
WHEREclause of the SQL statement contains theMATCH AGAINSTexpression and multiple filtering conditions connected byAND, and other secondary indexes can be hit, OceanBase Database will attempt to select the most cost-effective index for scanning.When calculating the
MATCH AGAINSTexpression by scanning the full-text index, if the query contains aLIMITclause, OceanBase Database will attempt to push the top-k calculation down to the full-text index scan to improve performance.When the query contains
MATCH AGAINSTpredicates connected byOR, an index merge plan will be automatically generated. Whether the index merge plan is selected depends on the cost competition.Note
- Starting from V4.4.1, the index merge plan supports multiple predicates connected by
AND. When the query contains multiple predicates, includingMATCH AGAINST, an index merge plan will be automatically generated. - Before V4.4.1, the index merge plan only supported multiple predicates connected by
OR.
- Starting from V4.4.1, the index merge plan supports multiple predicates connected by
Examples
Create the sample table
test_tbl1, and create the full-text indexesft_idx1_test_tbl1andft_idx1_test_tbl1, and the indexidx_test_tbl1.CREATE TABLE test_tbl1(id INT, ref_no INT, digest VARCHAR(512), detail VARCHAR(4096), FULLTEXT INDEX ft_idx1_test_tbl1(detail), FULLTEXT INDEX ft_idx2_test_tbl1(digest, detail), INDEX idx_test_tbl1 (id));Insert test data into the
test_tbl1table.INSERT INTO test_tbl1 VALUES (1, 1234, 'fulltext', 'Try text retrieval with OceanBase fulltext index'), (2, 2345, 'log', 'OceanBase can halp with log analysis'), (3, 3456, 'order', 'Simple text retrieval scan will return result set in order of descending ranking in OceanBase'), (4, 4567, 'ranking', 'OceanBase will ranking relevance to query for matched result set'), (5, 5678, 'filter', 'Using text retrieval as a filter condition would be probably more efficient');Use the
MATCH AGAINSTexpression to project relevance.SELECT id, digest, detail, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1;The return result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+--------------------+ | id | digest | detail | relevance | +------+----------+-----------------------------------------------------------------------------------------------+--------------------+ | 1 | fulltext | Try text retrieval with OceanBase fulltext index | 0.2989130434782609 | | 2 | log | OceanBase can halp with log analysis | 0.3142857142857143 | | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | 0.240174672489083 | | 4 | ranking | OceanBase will ranking relevance to query for matched result set | 0.2849740932642488 | | 5 | filter | Using text retrieval as a filter condition would be probably more efficient | 0 | +------+----------+-----------------------------------------------------------------------------------------------+--------------------+ 5 rows in setUse the
MATCH AGAINSTexpression to filter.SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase');The return result is as follows:
+------+----------+-----------------------------------------------------------------------------------------------+ | id | digest | detail | +------+----------+-----------------------------------------------------------------------------------------------+ | 2 | log | OceanBase can halp with log analysis | | 1 | fulltext | Try text retrieval with OceanBase fulltext index | | 4 | ranking | OceanBase will ranking relevance to query for matched result set | | 3 | order | Simple text retrieval scan will return result set in order of descending ranking in OceanBase | +------+----------+-----------------------------------------------------------------------------------------------+ 4 rows in setUse the
MATCH AGAINSTexpression to retrieve and sort.SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase');or
SELECT id, digest, MATCH (detail) AGAINST ('oceanbase') AS relevance FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') ORDER BY relevance DESC;The return result is as follows:
+------+----------+--------------------+ | id | digest | relevance | +------+----------+--------------------+ | 2 | log | 0.3142857142857143 | | 1 | fulltext | 0.2989130434782609 | | 4 | ranking | 0.2849740932642488 | | 3 | order | 0.240174672489083 | +------+----------+--------------------+ 4 rows in set
Example of text retrieval execution plan
The
MATCH AGAINSTexpression has filtering semantics and scans the full-text indexft_idx1_test_tbl1.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') AND id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ============================================================================= | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ----------------------------------------------------------------------------- | | |0 |SORT | |1 |30 | | | |1 |└─TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|1 |30 | | | ============================================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]) | | 1 - output([test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest], [MATCH(test_tbl1.detail) AGAINST('oceanbase')]), filter([test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=true, is_global_index=false, filter_before_indexback[false], | | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 15 rows in setThe
MATCH AGAINSTexpression has filtering semantics and hits the general secondary indexidx_test_tbl1, which has a lower scanning cost. Therefore, the indexidx_test_tbl1is used for scanning.EXPLAIN SELECT id FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase simple text retreival scan on fulltext index') AND id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ====================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------------------------- | | |0 |SORT | |1 |103 | | | |1 |└─TABLE RANGE SCAN|test_tbl1(idx_test_tbl1)|1 |103 | | | ====================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id]), filter(nil), rowset=16 | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index'), DESC]) | | 1 - output([test_tbl1.id], [MATCH(test_tbl1.detail) AGAINST('oceanbase simple text retreival scan on fulltext index')]), filter([MATCH(test_tbl1.detail) | | AGAINST('oceanbase simple text retreival scan on fulltext index')]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(3,MIN ; 3,MAX), | | range_cond([test_tbl1.id = 3]), has_functional_lookup=true | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 16 rows in setThe
MATCH AGAINSTexpression does not have filtering semantics and cannot filter the result set. Therefore, the full-text index cannot be used for scanning, and the main table is scanned.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') OR id = 3;The return result is as follows:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ==================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------- | | |0 |TABLE FULL SCAN|test_tbl1|5 |503 | | | ==================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.__pk_increment]), range(MIN ; MAX)always true, has_functional_lookup=true | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ 11 rows in setThe
MATCH AGAINSTexpression does not have filtering semantics and cannot filter the result set. Therefore, the full-text index cannot be used for scanning, and the indexidx_test_tbl1is used for scanning.EXPLAIN SELECT id FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') OR id = 3;The return result is as follows:
+------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------------------+ | =================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------------------- | | |0 |TABLE FULL SCAN|test_tbl1(idx_test_tbl1)|5 |503 | | | =================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id]), filter([MATCH(test_tbl1.detail) AGAINST('oceanbase') OR test_tbl1.id = 3]), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.id]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([test_tbl1.id], [test_tbl1.__pk_increment]), range(MIN,MIN ; MAX,MAX)always true, has_functional_lookup=true | +------------------------------------------------------------------------------------------------------------------------------+ 11 rows in setThe
MATCH AGAINSTexpression has filtering semantics and scans the corresponding full-text indexft_idx1_test_tbl1. TheLIMITclause is pushed down to the index scan for top-k calculation.EXPLAIN SELECT id, digest, detail FROM test_tbl1 WHERE MATCH (detail) AGAINST ('oceanbase') LIMIT 3;The return result is as follows:
+------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------+ | =========================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | --------------------------------------------------------------------------- | | |0 |TEXT RETRIEVAL SCAN|test_tbl1(ft_idx1_test_tbl1)|4 |31 | | | =========================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([test_tbl1.id], [test_tbl1.digest], [test_tbl1.detail]), filter(nil), rowset=16 | | access([test_tbl1.__pk_increment], [test_tbl1.detail], [test_tbl1.id], [test_tbl1.digest]), partitions(p0) | | is_index_back=true, is_global_index=false, | | calc_relevance=true, match_expr(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | pushdown_match_filter(MATCH(test_tbl1.detail) AGAINST('oceanbase')), | | sort_keys([MATCH(test_tbl1.detail) AGAINST('oceanbase'), DESC]), limit(3), offset(nil), with_ties(false) | +------------------------------------------------------------------------------------------------------------------+ 13 rows in set
