The HYBRID_SEARCH clause specifies a full-text search, vector search, and filter conditions in a SELECT statement by using a JSON string that conforms to the rules in this topic. It returns rows sorted by the hybrid search strategy.
Notice
This syntax is applicable only to MySQL mode.
Syntax
SELECT select_expr_list
FROM HYBRID_SEARCH(TABLE table_name, dsl_string) [table_alias];
-- select_expr_list is the SELECT list, which follows the syntax of a regular SELECT statement.
Parameters
| Parameter | Description |
|---|---|
table_name |
The name of the target table. Only heap-organized tables (ORGANIZATION = HEAP) are supported, including partitioned and non-partitioned tables. |
dsl_string |
A JSON string that describes query, knn, rank, from, size, and min_score.
NoticeExcept for the differences listed in "Limitations and Considerations" below, the syntax structure and parameter descriptions are the same as those of the SEARCH procedure's |
Limitations and considerations
This section only lists the syntax limitations. For functional limitations, see the related documentation at the end of this section: Hybrid search (SQL interface). Please refer to this section for a comprehensive understanding.
Considerations:
- At the top level, the combination of
query(full-text search) andknn(vector search) can have a maximum of three subqueries:querycounts as one;knnas an object counts as one, and if it's an array, each element counts as one. The total number of these three cannot exceed three, and at least one full-text or vector search must be included. - Column names in queries are case-insensitive.
- For vector queries, it is recommended to input the vector as a string.
- The
min_scoreparameter is only supported in the SQL interface and not in the PL interface. - Each
queryandknnare independent search paths, and their respectivefilterconditions do not affect each other. If a search path requires filtering conditions, the correspondingfiltermust be specified separately for that path. All search results will be merged into a single result set, scored and sorted based on the fusion algorithm, and the topsizeresults will be returned.
Limitations:
- The
rank_feature,es_mode, and_sourceparameters are not supported. - Direct use of
WHERE,ORDER BY, orLIMITat the same level asHYBRID_SEARCHis not supported. If further filtering or sorting is needed, the hybrid search results must be used as a subquery first. For example:-- Not supported: Using WHERE at the same level as HYBRID_SEARCH SELECT id FROM HYBRID_SEARCH(TABLE doc_table, '{"knn":{"field":"vector","k":5,"query_vector":"[1,2,3]"}}') WHERE id > 3; -- Supported: Use as a subquery first, then filter SELECT id FROM ( SELECT id FROM HYBRID_SEARCH(TABLE doc_table, '{"knn":{"field":"vector","k":5,"query_vector":"[1,2,3]"}}') ) t WHERE id > 3; - Some nested JSON/ARRAY element paths are not supported for pushdown. For example, if the
pathinjsonisarray, and the elements arejson, you cannot specify a path for thisjsonelement. - Scalar queries, including
term,range,terms,jsonexpressions, andarrayexpressions, do not support scoring orboost, and cannot be used inmustorshouldclauses ofbool. - The
boostparameters for non-top-level full-text queries,boostparameters forboolqueries, and the column and term weights for full-text queries must be greater than0. For otherboostparameters, the value must be greater than or equal to0. - For
multi_matchandquery_stringqueries involving multiple full-text columns, the same character set andcollationmust be used, and the sameparsermust be specified for the full-text index. - When
match,multi_match, orquery_stringis used for term-level matching, the corresponding full-text column must useFTS_INDEX_TYPE = MATCH(or the default type equivalent toMATCHif unspecified).match_phraseis only applicable to full-text columns withFTS_INDEX_TYPE = PHRASE_MATCH.
DSL_STRING syntax
dsl_string is a JSON-formatted string. The syntax is described in this section. Please refer to the parameters and examples below for a better understanding.
Recommended reading order
First, review the DSL skeleton example to understand the top-level fields. Then, refer to the BNF definitions as needed. For detailed field descriptions and constraints, consult the Parameters table. For end-to-end table creation and additional scenarios, see the related documentation index at the end of this section (SQL interface).
Syntax
This section describes the meaning and usage rules of BNF (Backus-Naur Form) syntax symbols:
Optional parameters
[ ]in BNF indicates that multiple elements are optional. For example,param_list = param [, param]*means thatparam_listcan contain one or moreparamelements.[ ]inrank_expressionalso indicates that subparameters are optional.[, "boost" : boost_value]indicates that the subparameter is optional in an expression that supportsboost. Theterm,range,terms, and JSON/ARRAY scalar expressions do not supportboost.
Arrays
[ ]in a JSON structure indicates an array. For example,[condition_list].
Selection
|indicates a selection. For example,param = "query" | "knn"means thatparamcan be either "query" or "knn".
Repeated elements
*indicates zero or more repetitions. For example,param_list = param [, param]*means thatparam_listcan contain one or moreparamelements.
JSON format requirements
- All JSON column names and string values must be enclosed in double quotation marks.
- Numerical values do not need to be enclosed in quotation marks.
Syntax definition
This section provides detailed syntax for dsl_string. For more information about the parameters, see the table below.
DSL skeleton example
The following example shows a skeleton of the dsl_string syntax that is equivalent to the "Full-text and vector RRF hybrid search" example. At the top level, you can specify query (full-text), knn (vector), rank (such as RRF), from / size (pagination), and other parameters. When you perform a single search, you can delete the unnecessary top-level keys. The SQL interface also allows you to optionally specify min_score. For more information, see the table below.
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"match": {
"content": "oceanbase mysql"
}
},
"knn": {
"field": "vector",
"k": 5,
"query_vector": "[1,2,3]"
},
"rank": {
"rrf": {
"rank_window_size": 10,
"rank_constant": 60
}
},
"from": 0,
"size": 10
}'
);
Top-level parameter structure
The top-level parameter structure specifies the parameters for hybrid search.
dsl_string = '{param_list}'
param_list = param [, param]*
-- At least one of `query` and `knn` must be specified. You can specify both of them in hybrid search.
param = "query" : {query_expression}
| "knn" : {knn_expression}
| "rank" : {rank_expression}
| "from" : number
| "size" : number
| "min_score" : number -- Only supported in the SQL interface.
Query expression structure
The query expression structure specifies the full-text and scalar query conditions for hybrid search.
query_expression = bool_query | scalar_term | fulltext_term
bool_query = "bool" : {bool_condition_list}
bool_condition_list = bool_condition [, bool_condition]*
bool_condition = "must" : [condition_list]
| "should" : [condition_list]
| "must_not" : [condition_list]
| "filter" : [condition_list]
| "boost" : boost_value
condition_list = query_expression [, query_expression]*
The values of must, should, must_not, and filter under bool are query expression arrays. Each item in the array is a complete query_expression (which can be nested within another bool or followed by match / term, etc.). The following example shows a fragment of the query object:
{
"bool": {
"must": [
{
"match": {
"content": "oceanbase"
}
}
],
"filter": [
{
"term": {
"status": 1
}
}
]
}
}
Scalar query structure
The scalar query structure defines scalar query expressions within a single term. It includes range_query, term_query, and terms_query, which correspond to the range, term, and terms expressions in SQL syntax. These represent range queries, exact matches, and multi-value matches, respectively.
scalar_term = range_query | term_query | terms_query
range_query = "range" : {"field_name" : {range_condition_list}}
range_condition_list = range_condition [, range_condition]*
range_condition = "gte" : number
| "gt" : number
| "lte" : number
| "lt" : number
term_query = "term" : {term_condition_list}
term_condition_list = term_condition [, term_condition]*
term_condition = "field_name" : scalar_value
| "field_name" : term_value_object
term_value_object = "value" : scalar_value
terms_query = "terms" : {terms_condition_list}
terms_condition_list = terms_condition [, terms_condition]*
terms_condition = "field_name" : [scalar_value_list]
scalar_value_list = scalar_value [, scalar_value]*
Full-text query structure
The full-text query structure defines full-text query expressions within a single term. It includes match_query, match_phrase_query, query_string, and multi_match, which correspond to the match, match_phrase, query_string, and multi_match expressions in SQL syntax. These represent term-level matching, phrase-level matching, full-text queries, and multi-column term-level matching, respectively.
fulltext_term = match_query | match_phrase_query | query_string | multi_match
match_query = "match" : {"field_name" : match_body}
match_body = "string_value" | {match_condition}
match_condition = "query" : "string_value" [, "operator" : ("OR" | "AND")] [, "minimum_should_match" : number] [, "boost" : boost_value]
match_phrase_query = "match_phrase" : {"field_name" : phrase_body}
phrase_body = "string_value" | {phrase_condition}
phrase_condition = "query" : "string_value" [, "slop" : number] [, "boost" : boost_value]
query_string = "query_string" : {query_string_condition}
query_string_condition = "fields" : [field_weight_list]
| "query" : "string_value" -- Supports ^ at the term level. For more information, see the section below.
| "boost" : boost_value
| "type" : ("best_fields" | "most_fields")
| "default_operator" : ("AND" | "OR")
| "minimum_should_match" : number
multi_match = "multi_match" : {multi_match_condition}
multi_match_condition = "fields" : [field_weight_list]
| "query" : "string_value" -- Does not support ^ at the term level.
| "boost" : boost_value
| "type" : ("best_fields" | "most_fields")
| "operator" : ("AND" | "OR")
| "minimum_should_match" : number
field_weight_list = field_weight [, field_weight]*
field_weight = "field_name[^number]"
Vector and sorting structure
The vector and sorting structure specifies the parameters for vector search, including the vector field, similarity, filter conditions, and weights. It supports single and multi-vector searches.
knn_expression = "knn" : {knn_condition_list} | [multi_knn_condition_list]
knn_condition_list = knn_condition [, knn_condition]*
knn_condition = "field" : "field_name"
| "k" : number
| "query_vector" : [vector_values]
| "filter" : [condition_list]
| "similarity" : number
| "boost" : boost_value
multi_knn_condition_list = {knn_condition} [, {knn_condition}]*
vector_values = float [, float]*
rank_expression = "rank" : {rank_strategy}
rank_strategy = "rrf" : {rrf_params}
rrf_params = "rank_window_size" : number [, "rank_constant" : number]
In single-vector search, knn is a single object. In multi-vector search, knn is an array of objects. Each object contains a set of knn_condition. The following example shows a fragment of the knn object in multi-vector search:
{
"knn": [
{
"field": "vector_a",
"k": 3,
"query_vector": "[1,0,0]"
},
{
"field": "vector_b",
"k": 3,
"query_vector": "[0,1,0]"
}
]
}
Basic type definitions
The basic data types used in the syntax above.
field_name = "string_value"
field_list = field_name [, field_name]*
number = integer | decimal
boost_value = integer | float
boolean = true | false
scalar_value = "string_value" | number | boolean
Detailed parameter description
| Expression type | Parameter name | Parameter description |
|---|---|---|
| Top-level keyword parameters | ||
| query | Can be used alone for full-text search or in combination with the knn parameter for hybrid search. |
|
| knn | Can be used alone for single/multi-vector search or in combination with the query parameter for hybrid search. |
|
| rank (optional) | Specifies the sorting strategy for hybrid search, supporting the RRF (Reciprocal Rank Fusion) algorithm. | |
| from (optional) | Specifies from which row in the search result set to return results. If not specified, the default is the first row. Must be used with the size parameter. |
|
| size (optional) | Restricts the number of returned results. If not specified, the default is 10. |
|
| bool | must | Must be satisfied and requires scoring. When nested boolean logic is needed, a bool expression must be nested. Multiple conditions within a bool expression default to AND logic. |
| should | Should be satisfied, similar to OR, and requires scoring. When nested boolean logic is needed, a bool expression must be nested. Multiple conditions within a bool expression default to AND logic. | |
| must_not | Must not be satisfied, does not require scoring, and is converted to a 'NOT' expression. Multiple conditions within must_not are connected by 'AND'. When nested boolean logic is needed, a bool expression must be nested. Multiple conditions within a bool expression default to AND logic. At least one positive condition (must / should / filter) must be included in a bool expression. Only one must_not condition is not supported. |
|
| filter | Must be satisfied, does not require scoring, and is converted to an 'AND' expression. When nested boolean logic is needed, a bool expression must be nested. Multiple conditions within a bool expression default to AND logic. | |
| boost (optional) | Query weight. For more details, see the detailed description of the boost parameter below. Note: Scalar queries such as term/range/terms do not support the boost parameter. |
|
| Scalar queries (scalar_term) | range | Range search, used with gte, gt, lte, and lt. fieldname is required. Scalar queries of this type do not participate in scoring, do not support the boost parameter, and cannot be placed in the must/should sections of a bool expression. |
| term | Exact match, supports scalar values such as strings, numbers, and boolean values, converted to an SQL '=' expression. Scalar queries of this type do not participate in scoring, do not support the boost parameter, and cannot be placed in the must/should sections of a bool expression. |
|
| terms | Exact match for any value in the specified set, supports arrays of scalar values such as strings, numbers, and boolean values, converted to an SQL 'IN' expression. Scalar queries of this type do not participate in scoring, do not support the boost parameter, and cannot be placed in the must/should sections of a bool expression. |
|
| Full-text queries (fulltext_term) | match | Full-text match, performs term matching on a single column. The corresponding column must have a single-column full-text index with FTS_INDEX_TYPE = MATCH or an equivalent default type. Supports the complete object form {"field":{"query":"...","operator":"OR|AND","minimum_should_match":n,"boost":...}} or the simplified form without optional subfields {"field":"query string"}. The query string is tokenized by the index's tokenizer; repeated keywords only increase the weight without changing whether the match is successful. operator specifies the logical operator between keywords, optional, defaulting to OR. OR means any keyword should match; AND means all keywords should match. minimum_should_match only takes effect when operator = OR, and is an integer in the range [0, INT32_MAX], optional, defaulting to 1. If 0, it is treated as 1. If greater than or equal to the number of keywords, it is equivalent to operator = AND. |
| match_phrase | Full-text match, performs phrase matching on a single column. The corresponding column must have a single-column full-text index with FTS_INDEX_TYPE = PHRASE_MATCH. Supports the complete object form {"field":{"query":"...","slop":n,"boost":...}} or the simplified form {"field":"phrase string"}. The phrase is tokenized by the index's tokenizer; if the phrase contains stop words, any token can match at the corresponding position. slop specifies the maximum allowed offset between tokens, an integer in the range [0, INT32_MAX]. 0 indicates exact matching, and positive integers indicate fuzzy matching (optional, default 0). |
|
| query_string | Use the ^ operator to extend the query to multiple columns. The query parameter is divided into multiple groups by the ^ operator. In each group, the query is matched against the specified columns, and the scores are aggregated by column. Finally, the scores of all groups are aggregated. For full-text search parameters, only the query_string parameter supports term-level weights. You can use the ^ operator in the query parameter to specify a weight for the keywords in each group. A weight must be a positive floating-point number. The query cannot contain the reserved words OR, AND, NOT, and TO (case-insensitive). The query also cannot contain the characters + - & | ! = < > ( ) [ ] { } " ~ * ? : \ / (as shown in the documentation, these characters must be properly escaped in the JSON representation). |
|
| multi_match | Full-text matching, term matching on multiple columns, field-centric: Each field is scored separately, then scores are aggregated. All fields must be single-column full-text indexes with consistent character sets and tokenizers. Note that the multi_match clause does not support term-level weighting, only field-level and query-level weighting are supported:
|
|
| Common parameters for full-text query | fields | A list of columns that are multi_match or query_string columns; you can append ^ with a positive floating-point number to specify the weight for each column. The default value is 1.0. The weight for a column specified multiple times is that of the last occurrence of the column in the column list. |
| query | A query string for the multi_match or query_string clauses. Query content for the match or match_phrase clauses is written under the query column (or directly as column values in simplified form). Keyword repetition only affects the weight. |
|
| minimum_should_match (optional) | In match and multi_match, when operator = OR, it specifies the minimum number of keywords that must be matched, with a default value of 1. If this value is set to 0, it is treated as 1. When this value exceeds the number of keywords, it is equivalent to using AND.In query_string, when default_operator = OR, it specifies the minimum number of groups (defined by operators and ^ weight) that must be met, following the same rules as above.
NoteWhen used within a |
|
| boost (optional) | Specify the query weight. For more information, see the parameters of the Boost column below. | |
| type (optional) | The match type for multi_match / query_string, which supports the best_fields mode that takes the maximum value. It also supports the most_fields mode that adds the scores, but it does not support the cross_fields or phrase modes. The default value is best_fields if no value is specified. |
|
| default_operator (optional) | Only the query_string. The logical operator used between groups and within groups, by default: OR means any group/word is matched. AND means all are matched. Optional. Default value is OR. |
|
| operator (optional) | The OR or AND operator between the match and multi_match keywords. Optional. Default value is OR. |
|
| knn (vector search) | ||
| field | The vector search column name. | |
| k | Number of rows returned by the execution of vector search. | |
| query_vector | Specify the vector to search for. | |
| filter (optional) | Filtering conditions. | |
| similarity | Used to specify the filtering condition for vector similarity computation. | |
| boost (optional) | Queries the weight. For more information, see the detailed description of the boost parameter in the description. | |
| rank (RRF-based sorting) | rrf | A RRF (Reciprocal Rank Fusion) sorting strategy, which is used to perform sorting when multiple queries are combined for searching. |
| rank_window_size (optional) | This parameter specifies the size of a single result set returned for each query. The larger the value, the more relevant the results are, but the more performance overhead is incurred. The final sorted result set is trimmed to the size specified by the size parameter in the search request. The rank_window_size parameter must meet the following conditions:
size parameter. |
|
| rank_constant (optional) | This value controls the impact of each document in a single result set on the final ranking. A higher value indicates that documents ranked lower have a greater impact on the final result. The default value is 60. |
Operator ^
The ^ operator specifies the weight of a column or a token in a full-text query statement. The syntax is column_name^weight or token^weight, for example, "title^2.0". The supported parameters are as follows:
- Only
multi_matchandquery_stringsupport column-level weights. - Only
query_stringsupports token-level weights.
The token-level weight operator specifies the weight of the token on its left (up to the space). The default weight of a token without the weight operator is 1.0. A token with the weight operator or a group of consecutive tokens without the weight operator is considered a group.
{
"query": {
"query_string": {
"fields": ["title^2.0", "content"],
"query": "red heart dragon fruit jelly orange^1.5 autumn moon pear^1.2 blueberry rambutan"
}
}
}
As shown in the example, the query token "red heart dragon fruit jelly orange^1.5 autumn moon pear^1.2 blueberry rambutan" is divided into four groups: "red heart" and "dragon fruit" with default weights of 1.0, "jelly" and "orange" with a weight of 1.5, "autumn moon" and "pear" with a weight of 1.2, and "blueberry" and "rambutan" with default weights of 1.0.
The ^ operator can be used with the boost parameter. For more information, see the boost parameter in the next section.
boost parameter
The boost parameter specifies the weight of the query condition in the final relevance calculation. The value must be ≥ 0. If not specified, the default value is 1. In the above syntax, bool, single_term, and single- and multi-vector searches (knn) support the boost parameter. Based on the limitations in this topic, the rules for using boost are as follows:
Supported expressions
boolqueries.- Full-text queries, such as
match,match_phrase,query_string, andmulti_match. - Vector queries (single- and multi-vector
knn).
Unsupported expressions
- Scalar queries:
term,range, andterms. - JSON/ARRAY scalar expressions.
- Scalar queries:
Value constraints
- The
boostparameter of non-top-level full-text queries, theboostparameter ofboolqueries, and the column and token weights of full-text queries must be greater than0. - In other cases, the
boostparameter must be greater than or equal to0.
- The
Examples
a. Query-level
boost(bool):{ "bool": { "filter": [{"term": {"category": "Gaming"}}], "boost": 2.0 } }b. Column-level weight (
query_string):{ "query_string": { "fields": ["product_name^2.0", "description^1.0"], "query": "gaming keyboard", "boost": 1.5 } }c. Vector query
boost(knn):{ "knn": { "field": "vector", "k": 5, "query_vector": "[1,2,3]", "boost": 1.5 } }
Examples
Full-text search
This example searches for all rows in the content column of the doc_table table that contain the term oceanbase mysql and returns the top 4 matching records.
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"match": {"content": "oceanbase mysql"}
}
}'
);
The expected output is as follows:
+------+---------+---------------------------------+----------------------------------+--------------------+
| c1 | vector | query | content | __score |
+------+---------+---------------------------------+----------------------------------+--------------------+
| 2 | [1,2,1] | hello world, what is your name | oceanbase mysql database | 2.170969786679347 |
| 1 | [1,2,3] | hello world | oceanbase Elasticsearch database | 0.3503184713375797 |
| 3 | [1,1,1] | hello world, how are you | oceanbase oracle database | 0.3503184713375797 |
| 6 | [2,1,1] | hello world, where are you from | starrocks oceanbase database | 0.3503184713375797 |
+------+---------+---------------------------------+----------------------------------+--------------------+
4 rows in set
Hybrid full-text and vector RRF search
Create a sample table with a vector column, create a vector index for it, and create full-text indexes for two VARCHAR columns.
CREATE TABLE doc_table(c1 INT, vector VECTOR(3), query VARCHAR(255), content VARCHAR(255), VECTOR INDEX idx1(vector) WITH (distance=l2, type=hnsw, lib=vsag), FULLTEXT INDEX idx2(query), FULLTEXT INDEX idx3(content));
Insert data.
INSERT INTO doc_table VALUES(1, '[1,2,3]', "hello world", "oceanbase Elasticsearch database"),
(2, '[1,2,1]', "hello world, what is your name", "oceanbase mysql database"),
(3, '[1,1,1]', "hello world, how are you", "oceanbase oracle database"),
(4, '[1,3,1]', "real world, where are you from", "postgres oracle database"),
(5, '[1,3,2]', "real world, how old are you", "redis oracle database"),
(6, '[2,1,1]', "hello world, where are you from", "starrocks oceanbase database");
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"match": {"content": "oceanbase mysql"}
},
"knn": {
"field": "vector",
"k": 5,
"query_vector": "[1,2,3]"
},
"rank": {
"rrf": {
"rank_constant": 60,
"rank_window_size": 10
}
}
}'
);
The expected output is as follows:
+------+---------+---------------------------------+----------------------------------+----------------------+
| c1 | vector | query | content | __score |
+------+---------+---------------------------------+----------------------------------+----------------------+
| 1 | [1,2,3] | hello world | oceanbase Elasticsearch database | 0.03252247488101534 |
| 2 | [1,2,1] | hello world, what is your name | oceanbase mysql database | 0.032266458495966696 |
| 3 | [1,1,1] | hello world, how are you | oceanbase oracle database | 0.031754032258064516 |
| 5 | [1,3,2] | real world, how old are you | redis oracle database | 0.016129032258064516 |
| 6 | [2,1,1] | hello world, where are you from | starrocks oceanbase database | 0.016129032258064516 |
| 4 | [1,3,1] | real world, where are you from | postgres oracle database | 0.015625 |
+------+---------+---------------------------------+----------------------------------+----------------------+
6 rows in set
References
- For more information about the syntax and examples, see Hybrid search (SQL interface).
