The SEARCH function returns the search results in JSON format, sorted by relevance.
Syntax
FUNCTION SEARCH (IN table_name VARCHAR(65535),
IN search_params LONGTEXT)
RETURN JSON;
Parameters
| Parameter | Description | Required |
|---|---|---|
| table_name | The name of the table. This parameter applies to heap tables and tables without a primary key. | Yes |
| search_params | The search parameters, in JSON format. | Yes |
search_params syntax
search_params is a JSON string. The syntax of search_params is described in this section. For more information, see the parameters and examples.
Syntax
This section describes the BNF syntax symbols and their usage rules:
Optional parameters
[ ]in BNF indicates that multiple elements are optional. For example,param_list = param [, param]*means thatparam_listcan contain one or moreparam.- In
rank_featureandrank_expression,[ ]also indicates that subparameters are optional. [, "boost" : boost_value]indicates that the boost subparameter is optional, and all boost subparameters are optional.
Arrays
[ ]in a JSON structure indicates an array. For example,[condition_list].
Selection
|indicates a selection. For example,param = "query" | "knn"means thatparamcan be either "query" or "knn".
Repetition
*indicates zero or more repetitions. For example,param_list = param [, param]*means thatparam_listcan contain one or moreparam.
JSON format requirements
- All JSON field names and string values must be enclosed in double quotation marks.
- Numerical values do not need to be enclosed in double quotation marks.
Syntax
This section describes the syntax of search_params. For more information about the parameters, see the parameter description table below.
Top-level parameter structure
-- Top-level keyword parameters
search_params = '{param_list}'
param_list = param [, param]*
-- The query and knn parameters are used for full-text/scalar and vector retrieval, respectively. At least one of them must be specified. In mixed retrieval, both are used.
param = "query" : {query_expression}
| "knn" : {knn_expression}
| "rank" : {rank_expression}
| "_source" : [field_list]
| "from" : number
| "size" : number
| "es_mode" : boolean
Notice
When using mixed search, it is strongly recommended to specify the _source parameter to define the range of returned columns. If not specified, all columns of the table will be returned by default. If the table contains vector columns, this will significantly increase the response time (RT).
Query expression structure
query_expression = bool_query | single_term
-- bool_query syntax
bool_query = "bool" : {bool_condition_list}
bool_condition_list = bool_condition [, bool_condition]*
-- The boost subparameter is optional.
bool_condition = "must" : [condition_list]
| "should" : [condition_list]
| "must_not" : [condition_list]
| "filter" : [condition_list]
| "boost" : boost_value
-- Nested bool_query is supported.
condition_list = query_expression [, query_expression]*
Single term query syntax
single_term = range_query | match_query | term_query | terms_query | query_string | multi_match | rank_feature
-- range_query syntax
range_query = "range" : {"field_name" : {range_condition_list}}
range_condition_list = range_condition [, range_condition]*
-- boost subparameter is optional
range_condition = "gte" : number
| "gt" : number
| "lte" : number
| "lt" : number
| "boost" : boost_value
-- match_query syntax
match_query = "match" : {"field_name" : {match_condition}}
-- boost subparameter is optional
match_condition = "query" : "string_value" [, "boost" : boost_value]
-- term_query syntax
term_query = "term" : {term_condition_list}
term_condition_list = term_condition [, term_condition]*
term_condition = "field_name" : scalar_value
| "field_name" : term_value_object
-- boost subparameter is optional
term_value_object = "value" : scalar_value [, "boost" : boost_value]
-- terms_query syntax
terms_query = "terms" : {terms_condition_list}
terms_condition_list = terms_condition [, terms_condition]*
terms_condition = "field_name" : [scalar_value_list]
| "boost" : boost_value
scalar_value_list = scalar_value [, scalar_value]*
-- Full-text search expression syntax, which includes query_string and multi_match search methods
query_string = "query_string" : {query_string_condition}
-- fields and query subparameters are required
query_string_condition = "fields" : [field_weight_list]
| "query" : "string_value"
| "boost" : boost_value
| "type" : ("best_fields" | "cross_fields" | "most_fields" | "phrase")
| "default_operator" : ("AND" | "OR")
| "minimum_should_match" : number
field_weight_list = field_weight [, field_weight]*
-- No spaces are allowed between field_name, ^, and number
field_weight = "field_name[^number]"
-- multi_match syntax
multi_match = "multi_match" : {multi_match_condition}
-- fields and query subparameters are required
multi_match_condition = "fields" : [field_weight_list]
| "query" : "string_value"
| "boost" : boost_value
| "type" : ("best_fields" | "cross_fields" | "most_fields" | "phrase")
| "operator" : ("AND" | "OR")
| "minimum_should_match" : number
field_weight_list = field_weight [, field_weight]*
-- No spaces are allowed between field_name, ^, and number
field_weight = "field_name[^number]"
-- Feature ranking expression syntax
rank_feature = "rank_feature" : {rank_feature_condition_list}
rank_feature_condition_list = "field" : "field_name",
rank_algorithm
rank_algorithm = "saturation" : {saturation_params}
| "sigmoid" : {sigmoid_params}
| "log" : {log_params}
| "linear" : {linear_params}
saturation_params = "pivot" : number [, "positive_score_impact" : boolean]
sigmoid_params = "pivot" : number, "exponent" : number [, "positive_score_impact" : boolean]
log_params = "scaling_factor" : number [, "positive_score_impact" : boolean]
linear_params = ["positive_score_impact" : boolean]
Vector retrieval expression structure
-- Single vector retrieval expression structure
knn_expression = "knn" : {knn_condition_list}
knn_condition_list = knn_condition [, knn_condition]*
-- The field, k, and query_vector subparameters are required.
knn_condition = "field" : "field_name"
| "k" : number
| "query_vector" : [vector_values]
| "filter" : [condition_list]
| "similarity" : number
| "boost" : boost_value
vector_values = float [, float]*
condition_list = query_expression [, query_expression]*
-- Multi vector retrieval expression structure
-- [multi_knn_condition_list] in the following syntax is a fixed format that cannot be modified.
multi_knn_expression = "knn" : [multi_knn_condition_list]
-- [multi_knn_condition] in the following syntax is optional.
multi_knn_condition_list = {multi_knn_condition} [, {multi_knn_condition}]*
-- The field, k, and query_vector subparameters are required.
multi_knn_condition = "field" : "field_name"
| "k" : number
| "query_vector" : [vector_values]
| "filter" : [condition_list]
| "similarity" : number
| "boost" : boost_value
vector_values = float [, float]*
condition_list = query_expression [, query_expression]*
-- rank_expression syntax
rank_expression = "rank" : {rank_strategy}
rank_strategy = "rrf" : {rrf_params}
rrf_params = "rank_window_size" : number [, "rank_constant" : number]
Basic type definitions
-- Basic type definitions
field_name = "string_value"
field_list = field_name [, field_name]*
number = integer | decimal
boost_value = integer | float -- The boost parameter value must be >= 0.
boolean = true | false
scalar_value = "string_value" | number | boolean
Parameter details
The detailed parameters of search_params are described as follows:
| Expression type | Parameter name | Description |
|---|---|---|
| Top-level keyword parameters | ||
| query | Can be used alone for full-text search and can be used with the knn parameter for hybrid search. |
|
| knn | Can be used alone for single/multi-vector search and can be used with the query parameter for hybrid search. |
|
| rank (optional) | Used to specify the sorting strategy for hybrid search, supporting the RRF (Reciprocal Rank Fusion) algorithm. | |
| _source (optional) | Used to specify the columns to return in the search results. If not specified, all user-defined columns in the table are returned. | |
| from (optional) | Used to specify the row from which to start returning results in the search result set. If not specified, the default is to start from the first row. This parameter must be used with the size parameter. |
|
| size (optional) | Used to limit the number of results returned. If not specified, the default is 10. |
|
| es_mode | Used to specify whether to convert full-text search to the new ESQL syntax. The default is false. |
|
| bool | must | Must be satisfied and requires scoring. When using boolean logic internally, a nested bool expression is required. Multiple conditions within a bool expression are combined using AND logic by default. |
| should | Should be satisfied, similar to OR, and requires scoring. When using boolean logic internally, a nested bool expression is required. Multiple conditions within a bool expression are combined using AND logic by default. | |
| must_not | Must not be satisfied, does not require scoring, and is converted to a 'NOT' expression. Multiple conditions within must_not are connected using 'AND'. When using boolean logic internally, a nested bool expression is required. Multiple conditions within a bool expression are combined using AND logic by default. | |
| filter | Must be satisfied, does not require scoring, and is converted to an 'AND' expression. When using boolean logic internally, a nested bool expression is required. Multiple conditions within a bool expression are combined using AND logic by default. | |
| boost (optional) | Query weight. For more details, see the description of the boost parameter below. | |
| rank_feature (parameters for score calculation) | pivot | A required scoring parameter for saturation and sigmoid algorithms. The default value is the geometric mean of the data in this list. |
| positive_score_impact (optional) | Used to specify whether the field's impact on the final relevance is positive or negative. | |
| scaling_factor | A required parameter for the log scoring calculation formula. | |
| exponent | A required parameter for the sigmoid algorithm calculation formula. | |
| rank_feature (score calculation algorithms) | saturation | The default scoring algorithm
|
| sigmoid |
|
|
| log |
|
|
| linear |
|
|
| single term (single-term search) | ||
| range | Range search, used with gte, gt, lte, lt, and boost. fieldname is required. |
|
| match | Fuzzy match, converted to the SQL 'match' expression, used with boost. | |
| term | Exact match, supports scalar values such as strings, numbers, and booleans, converted to the SQL '=' expression, used with boost. | |
| terms | Exact match for any value in the specified set, supports arrays of scalar values such as strings, numbers, and booleans, converted to the SQL 'IN' expression, used with boost. | |
| query_string | Full-text match, converted to a combination of multiple 'match' expressions in SQL. | |
| multi_match | Full-text match, converted to a combination of multiple 'match' expressions in SQL, similar to query_string, but does not support weight for single keywords. | |
| fields | List of text search fields, with configurable weights for each column. | |
| query | List of search keywords, with configurable weights for each keyword. | |
| minimum_should_match (optional) | Controls the number of conditions that must be met in the should and query_string clauses. If this parameter is not specified, the default value is 1. Note: If the bool expression contains must/filter clauses and this parameter is not specified, the default value will be 0, meaning that the should conditions can be unmet. |
|
| boost (optional) | The query weight, see the detailed description of the boost parameter below. | |
| type (optional) | Specifies the matching mode. This version supports best_fields, cross_fields, most_fields, and phrase. The default value is best_fields if not specified. |
|
| default_operator (optional) | A subfield of query_string that specifies the logical combination between multiple keywords. | |
| operator (optional) | A subfield of multi_match that specifies the logical combination between multiple keywords. | |
| knn (vector search) | ||
| field | The vector search field. | |
| k | The number of rows returned by the vector search. | |
| query_vector | Specifies the search vector. | |
| filter (optional) | The filter condition. | |
| similarity (optional) | Specifies the vector distance filter condition. | |
| boost (optional) | The query weight, see the detailed description of the boost parameter below. | |
| rank (RRF sorting strategy) | rrf | The RRF (Reciprocal Rank Fusion) sorting strategy, used to fuse and sort multiple query results during hybrid search. |
| rank_window_size (optional) | This value specifies the size of the result set returned for each query. A larger value indicates higher relevance but may incur performance overhead. The final sorted result set will be trimmed to the size specified in the search request's size parameter.rank_window_size must meet the following conditions:
size parameter. |
|
| rank_constant (optional) | This value controls the impact of each document in the result set on the final sorting result. A larger value indicates a greater impact from documents ranked lower. The default value is 60. |
boost parameter
The boost parameter specifies the weight of the query conditions in the final relevance calculation. The value must be ≥ 0, and the default value is 1 if not specified. In the syntax structure above, the bool, single_term (excluding rank_feature), single and multi-vector retrieval (knn) all support the boost parameter. The usage of the boost parameter for each retrieval type is as follows:
Query-level boost
Specify the weight for the entire query condition, for example:
{ "bool": { "must": [{"term": {"category": "Gaming"}}], "boost": 2.0 // Weight of the entire bool query } }Field-level boost
Specify the weight for a specific field's query:
{ "query_string": { "fields": ["product_name", "description"], "query": "gaming keyboard", "boost": 1.5 // Weight of the entire query_string query } }Match value-level boost
Specify the weight for specific match values (supported by match, term, and terms queries):
{ "match" : { "product_name": { "query" : "gaming keyboard", "boost" : 1.5 } } }Field weight syntax
In query_string and multi_match, you can use the
field_name^weightsyntax:{ "query_string": { "fields": ["product_name^2.0", "description^1.0"], "query": "gaming" } }Vector retrieval boost
In single and multi-vector retrieval, each vector field can independently specify the
boostweight:Single-vector retrieval:
{ "knn": { "field": "vector", "k": 5, "query_vector": [1, 2, 3], "boost": 1.5 // Weight of this vector field } }Multi-vector retrieval:
{ "knn": [{ "field": "c2", "k": 5, "query_vector": [1, 2, 3], "boost": 2.0 // Weight of the c2 vector field }, { "field": "c4", "k": 5, "query_vector": [1, 2, 3], "boost": 1.0 // Weight of the c4 vector field }] }
Examples
Scalar retrieval
The products table has the following schema:
CREATE TABLE products (
`product_id` varchar(50) DEFAULT NULL,
`product_name` varchar(255) DEFAULT NULL,
`description` text DEFAULT NULL,
`brand` varchar(100) DEFAULT NULL,
`category` varchar(100) DEFAULT NULL,
`tags` varchar(255) DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
`stock_quantity` int(11) DEFAULT NULL,
`release_date` datetime DEFAULT NULL,
`is_on_sale` tinyint(1) DEFAULT NULL,
`vec` VECTOR(4) DEFAULT NULL
);
Insert data.
INSERT INTO products VALUES
('prod-004', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,rgb', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.5,0.1,0.6,0.9]'),
('prod-009', 'Gamer-Pro Headset', 'High-fidelity gaming headset with a noise-cancelling microphone.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,audio', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.1,0.9,0.2,0]');
Set the retrieval parameters.
SET @parm = '{
"query": {
"bool": {
"must": [
{"term": {"brand": "GamerZone"}}
]
}
},
"_source": ["product_id", "product_name", "description", "brand", "category", "tags", "price", "stock_quantity", "release_date", "is_on_sale"]
}';
Retrieve all records where brand is "GamerZone".
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm));
The result is as follows:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm)) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"tags": "best-seller,gaming-gear,rgb",
"brand": "GamerZone",
"price": 149.00,
"_score": 1,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-004",
"description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
"product_name": "Gamer-Pro Mechanical Keyboard",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
},
{
"tags": "best-seller,gaming-gear,audio",
"brand": "GamerZone",
"price": 149.00,
"_score": 1,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-009",
"description": "High-fidelity gaming headset with a noise-cancelling microphone.",
"product_name": "Gamer-Pro Headset",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
}
] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
Full-text and vector mixed retrieval
Create an example table with a vector column, create a vector index for the vector column, and create full-text indexes for two varchar columns.
CREATE TABLE doc_table(c1 INT, vector VECTOR(3), query VARCHAR(255), content VARCHAR(255), VECTOR INDEX idx1(vector) WITH (distance=l2, type=hnsw, lib=vsag), FULLTEXT INDEX idx2(query), FULLTEXT INDEX idx3(content));
Write data.
INSERT INTO doc_table VALUES(1, '[1,2,3]', "hello world", "oceanbase Elasticsearch database"),
(2, '[1,2,1]', "hello world, what is your name", "oceanbase mysql database"),
(3, '[1,1,1]', "hello world, how are you", "oceanbase oracle database"),
(4, '[1,3,1]', "real world, where are you from", "postgres oracle database"),
(5, '[1,3,2]', "real world, how old are you", "redis oracle database"),
(6, '[2,1,1]', "hello world, where are you from", "starrocks oceanbase database");
Set the retrieval parameters.
SET @parm = '{
"query": {
"bool": {
"should": [
{"match": {"query": "hi hello"}},
{"match": { "content": "oceanbase mysql" }}
]
}
},
"knn" : {
"field": "vector",
"k": 5,
"query_vector": [1,2,3]
},
"rank": {
"rrf": {
"rank_window_size": 10,
"rank_constant": 60
}
},
"_source" : ["query", "content"]
}';
Execute the query and return the query result.
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
The result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"query": "hello world",
"_score": 0.0325,
"content": "oceanbase Elasticsearch database"
},
{
"query": "hello world, what is your name",
"_score": 0.0323,
"content": "oceanbase mysql database"
},
{
"query": "hello world, how are you",
"_score": 0.0315,
"content": "oceanbase oracle database"
},
{
"query": "real world, how old are you",
"_score": 0.0161,
"content": "redis oracle database"
},
{
"query": "hello world, where are you from",
"_score": 0.0159,
"content": "starrocks oceanbase database"
},
{
"query": "real world, where are you from",
"_score": 0.0156,
"content": "postgres oracle database"
}
] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
