The SEARCH function returns search results in JSON format, sorted by relevance.
Syntax
FUNCTION SEARCH (IN table_name VARCHAR(65535),
IN search_params LONGTEXT)
RETURN JSON;
Parameters
| Parameter | Description | Required |
|---|---|---|
| table_name | The name of the table. The table can be a heap table or a table without a primary key. | Yes |
| search_params | The search parameters, in JSON string format. | Yes |
search_params syntax
search_params is a JSON string. The syntax of search_params is described in this section. Please refer to the parameters and examples below for a better understanding.
Syntax
This section describes the BNF syntax symbols and their usage rules:
Optional parameters
- In BNF,
[ ]indicates that multiple elements are optional. For example,param_list = param [, param]*indicates thatparam_listcan contain one or moreparamelements. - In
rank_featureandrank_expression,[ ]also indicates that subparameters are optional. [, "boost" : boost_value]indicates that the boost subparameter is optional. All boost subparameters are optional.
- In BNF,
Arrays
- In a JSON structure,
[ ]indicates an array. For example,[condition_list]indicates an array.
- In a JSON structure,
Selection
|indicates a selection. For example,param = "query" | "knn"indicates thatparamcan be "query" or "knn".
Repeated elements
*indicates zero or more repetitions. For example,param_list = param [, param]*indicates thatparam_listcan contain one or moreparamelements.
JSON format requirements
- All JSON field names and string values must be enclosed in double quotation marks.
- Numerical values do not need to be enclosed in double quotation marks.
Syntax
This section describes the syntax of search_params. For more information about the parameters, see the parameter description table below.
Top-level parameter structure
-- Top-level keyword parameters
search_params = '{param_list}'
param_list = param [, param]*
-- The query and knn parameters are used for full-text/scalar and vector search, respectively. At least one of them must be specified. For mixed search, both are used.
param = "query" : {query_expression}
| "knn" : {knn_expression}
| "rank" : {rank_expression}
| "_source" : [field_list]
| "from" : number
| "size" : number
| "es_mode" : boolean
Notice
When mixed search is used, we strongly recommend that you specify the _source parameter to specify the columns to be returned. If you do not specify this parameter, all columns in the table will be returned by default. If the table contains vector columns, this will significantly increase the response time (RT).
Query expression structure
query_expression = bool_query | single_term
-- bool_query syntax
bool_query = "bool" : {bool_condition_list}
bool_condition_list = bool_condition [, bool_condition]*
-- The boost subparameter is optional.
bool_condition = "must" : [condition_list]
| "should" : [condition_list]
| "must_not" : [condition_list]
| "filter" : [condition_list]
| "boost" : boost_value
-- Nested bool_query is supported.
condition_list = query_expression [, query_expression]*
Single-term query structure
single_term = range_query | match_query | term_query | query_string | multi_match | rank_feature
-- range_query syntax
range_query = "range" : {"field_name" : {range_condition_list}}
range_condition_list = range_condition [, range_condition]*
-- boost subparameter is optional
range_condition = "gte" : number
| "gt" : number
| "lte" : number
| "lt" : number
| "boost" : boost_value
-- match_query syntax
match_query = "match" : {"field_name" : {match_condition}}
-- boost subparameter is optional
match_condition = "query" : "string_value" [, "boost" : boost_value]
-- term_query syntax
term_query = "term" : {term_condition_list}
term_condition_list = term_condition [, term_condition]*
term_condition = "field_name" : scalar_value
| "field_name" : term_value_object
-- boost subparameter is optional
term_value_object = "value" : scalar_value [, "boost" : boost_value]
-- Full-text search expression structure, which supports query_string and multi_match
query_string = "query_string" : {query_string_condition}
-- fields and query subparameters are required
query_string_condition = "fields" : [field_weight_list]
| "query" : "string_value"
| "boost" : boost_value
| "type" : ("best_fields" | "cross_fields" | "most_fields" | "phrase")
| "default_operator" : ("AND" | "OR")
| "minimum_should_match" : number
field_weight_list = field_weight [, field_weight]*
-- No spaces are allowed between field_name, ^, and number
field_weight = "field_name[^number]"
-- multi_match syntax, supported starting from V4.4.1 BP0 HotFix1
multi_match = "multi_match" : {multi_match_condition}
-- fields and query subparameters are required
multi_match_condition = "fields" : [field_weight_list]
| "query" : "string_value"
| "boost" : boost_value
| "type" : ("best_fields" | "cross_fields" | "most_fields" | "phrase")
| "operator" : ("AND" | "OR")
| "minimum_should_match" : number
field_weight_list = field_weight [, field_weight]*
-- No spaces are allowed between field_name, ^, and number
field_weight = "field_name[^number]"
-- Feature ranking expression structure
rank_feature = "rank_feature" : {rank_feature_condition_list}
rank_feature_condition_list = "field" : "field_name",
rank_algorithm
rank_algorithm = "saturation" : {saturation_params}
| "sigmoid" : {sigmoid_params}
| "log" : {log_params}
| "linear" : {linear_params}
saturation_params = "pivot" : number [, "positive_score_impact" : boolean]
sigmoid_params = "pivot" : number, "exponent" : number [, "positive_score_impact" : boolean]
log_params = "scaling_factor" : number [, "positive_score_impact" : boolean]
linear_params = ["positive_score_impact" : boolean]
Vector search expression structure
-- knn_expression syntax
knn_expression = "knn" : {knn_condition_list}
knn_condition_list = knn_condition [, knn_condition]*
-- The field, k, and query_vector subparameters are required.
knn_condition = "field" : "field_name"
| "k" : number
| "query_vector" : [vector_values]
| "filter" : [condition_list]
| "similarity" : number
| "boost" : boost_value
vector_values = float [, float]*
condition_list = {condition [, condition]*}
condition = single_term
-- rank_expression syntax. RRF is supported starting from V4.4.1 BP0 HotFix1.
rank_expression = "rank" : {rank_strategy}
rank_strategy = "rrf" : {rrf_params}
rrf_params = "rank_window_size" : number [, "rank_constant" : number]
Basic types
-- Basic types
field_name = "string_value"
field_list = field_name [, field_name]*
number = integer | decimal
boost_value = integer | float -- The boost parameter must be >= 0.
boolean = true | false
scalar_value = "string_value" | number | boolean
Parameter details
The following table describes the parameters of search_params:
| Expression type | Parameter name | Description |
|---|---|---|
| Top-level keyword parameters | ||
| query | Can be used alone for full-text search or in combination with the knn parameter for mixed retrieval. |
|
| knn | Can be used alone for vector search or in combination with the query parameter for mixed retrieval. |
|
| rank (optional) | Specifies the sorting strategy for mixed retrieval, supporting the RRF (Reciprocal Rank Fusion) algorithm.
NoticeThis parameter is supported starting from V4.4.1 BP0 HotFix1. |
|
| _source (optional) | Specifies the columns to return in the search result. If not specified, all user-defined columns in the table are returned. | |
| from (optional) | Specifies the row number from which to start returning results. If not specified, the default is the first row. Must be used with the size parameter. |
|
| size (optional) | Specifies the maximum number of results to return. If not specified, the default is 10. |
|
| es_mode | Specifies whether to convert full-text search to the new ESQL syntax. Default is false. |
|
| bool | must | Must be satisfied and scored. When using boolean logic internally, nest bool expressions. Conditions within a bool expression are combined using AND logic by default. |
| should | Should be satisfied and scored. When using boolean logic internally, nest bool expressions. Conditions within a bool expression are combined using AND logic by default. | |
| must_not | Must not be satisfied. Conditions within a must_not clause are combined using AND logic. When using boolean logic internally, nest bool expressions. Conditions within a bool expression are combined using AND logic by default. | |
| filter | Must be satisfied and not scored. Conditions within a filter clause are combined using AND logic. When using boolean logic internally, nest bool expressions. Conditions within a bool expression are combined using AND logic by default. | |
| boost (optional) | Query weight. For details, see the boost parameter description below. | |
| rank_feature (parameters for calculating relevance scores) | pivot | A required parameter for the saturation and sigmoid scoring algorithms. The default value is the geometric mean of the data in this list. |
| positive_score_impact (optional) | Specifies whether the field's impact on the final relevance is positive or negative. | |
| scaling_factor | A required parameter for the log scoring algorithm. | |
| exponent | A required parameter for the sigmoid scoring algorithm. | |
| rank_feature (scoring algorithms) | saturation | The default relevance scoring algorithm
|
| sigmoid |
|
|
| log |
|
|
| linear |
|
|
| single term (single-term search) | ||
| range | Range search. Use with gte, gt, lte, lt, and boost. fieldname is required. |
|
| match | Fuzzy match. Converted to the SQL 'match' expression. Use with boost. | |
| term | Exact match. Supports scalar values such as strings, numbers, and booleans. Converted to the SQL '=' expression. Use with boost. | |
| query_string | Full-text match. Converted to a combination of multiple 'match' expressions in SQL. | |
| multi_match | Full-text match. Converted to a combination of multiple 'match' expressions in SQL. Similar to query_string, but single keywords do not support weights.
NoticeThis parameter is supported starting from V4.4.1 BP0 HotFix1. |
|
| fields | List of text search fields. Weights can be configured for each field. | |
| query | Retrieves a list of keywords, each of which can be assigned a weight. | |
| minimum_should_match (optional) | Specifies the number of conditions that must be met in the should or query_string clauses. If this parameter is not specified, the default value is 1. Note: If the bool expression contains must/filter clauses and this parameter is not specified, the default value is 0, meaning that the should conditions can be unmet. |
|
| boost (optional) | The query weight. For more information, see the detailed description of the boost parameter below. | |
| type (optional) | Specifies the matching mode. Supported values include best_fields, cross_fields, most_fields, and phrase. The default value is best_fields if this parameter is not specified. |
|
| default_operator (optional) | A subfield of query_string that specifies the logical combination of multiple keywords. | |
| operator (optional) | A subfield of multi_match that specifies the logical combination of multiple keywords. | |
| knn (vector search) | ||
| field | The vector search field. | |
| k | The number of rows to return for vector search. | |
| query_vector | The search vector. | |
| filter (optional) | The filter condition. | |
| similarity (optional) | The filter condition for vector distance. | |
| boost (optional) | The query weight. For more information, see the detailed description of the boost parameter below. | |
| rank (RRF sorting strategy) | rrf | The RRF (Reciprocal Rank Fusion) sorting strategy, which is used to integrate and sort multiple query results during hybrid search.
NoticeThis parameter is supported starting from V4.4.1 BP0 HotFix1. |
| rank_window_size (optional) | This parameter specifies the size of the result set returned for each query. A larger value indicates higher relevance but may result in higher performance overhead. The final sorted result set will be trimmed to the size specified by the size parameter in the search request.The rank_window_size parameter must meet the following conditions:
size parameter. |
|
| rank_constant (optional) | This parameter controls the impact of each document in the result set on the final sorting result. A larger value indicates a greater impact from documents ranked lower in the result set. The default value is 60. |
boost parameter
The boost parameter specifies the weight of query conditions in the final relevance calculation. The value must be ≥ 0, and the default value is 1 when not specified. In the above syntax, the bool, single_term (excluding rank_feature), and knn parameters support the boost parameter.
Query-level boost
Specify a weight for the entire query condition, for example:
{ "bool": { "must": [{"term": {"category": "Gaming"}}], "boost": 2.0 // Weight for the entire bool query } }Field-level boost
Specify a weight for a specific field:
{ "query_string": { "fields": ["product_name", "description"], "query": "gaming keyboard", "boost": 1.5 // Weight for the entire query_string query } }Match value-level boost
Specify a weight for specific match values (only supported for match and term queries):
{ "match" : { "product_name": { "query" : "gaming keyboard", "boost" : 1.5 } } }Field weight syntax
In query_string and multi_match, you can use the
field_name^weightsyntax:{ "query_string": { "fields": ["product_name^2.0", "description^1.0"], "query": "gaming" } }
Examples
Scalar retrieval
The products table has the following schema:
CREATE TABLE products (
`product_id` varchar(50) DEFAULT NULL,
`product_name` varchar(255) DEFAULT NULL,
`description` text DEFAULT NULL,
`brand` varchar(100) DEFAULT NULL,
`category` varchar(100) DEFAULT NULL,
`tags` varchar(255) DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
`stock_quantity` int(11) DEFAULT NULL,
`release_date` datetime DEFAULT NULL,
`is_on_sale` tinyint(1) DEFAULT NULL,
`vec` VECTOR(4) DEFAULT NULL
);
Insert data.
INSERT INTO products VALUES
('prod-004', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,rgb', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.5,0.1,0.6,0.9]'),
('prod-009', 'Gamer-Pro Headset', 'High-fidelity gaming headset with a noise-cancelling microphone.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,audio', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.1,0.9,0.2,0]');
Set the retrieval parameters.
SET @parm = '{
"query": {
"bool": {
"must": [
{"term": {"brand": "GamerZone"}}
]
}
},
"_source": ["product_id", "product_name", "description", "brand", "category", "tags", "price", "stock_quantity", "release_date", "is_on_sale"]
}';
Retrieve all records where brand is "GamerZone".
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm));
The result is as follows:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm)) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"tags": "best-seller,gaming-gear,rgb",
"brand": "GamerZone",
"price": 149.00,
"_score": 1,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-004",
"description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
"product_name": "Gamer-Pro Mechanical Keyboard",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
},
{
"tags": "best-seller,gaming-gear,audio",
"brand": "GamerZone",
"price": 149.00,
"_score": 1,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-009",
"description": "High-fidelity gaming headset with a noise-cancelling microphone.",
"product_name": "Gamer-Pro Headset",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
}
] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
Hybrid retrieval with full-text and vector search
Create a sample table with a vector column, create a vector index, and create full-text indexes for two varchar columns.
CREATE TABLE doc_table(c1 INT, vector VECTOR(3), query VARCHAR(255), content VARCHAR(255), VECTOR INDEX idx1(vector) WITH (distance=l2, type=hnsw, lib=vsag), FULLTEXT INDEX idx2(query), FULLTEXT INDEX idx3(content));
Write data.
INSERT INTO doc_table VALUES(1, '[1,2,3]', "hello world", "oceanbase Elasticsearch database"),
(2, '[1,2,1]', "hello world, what is your name", "oceanbase mysql database"),
(3, '[1,1,1]', "hello world, how are you", "oceanbase oracle database"),
(4, '[1,3,1]', "real world, where are you from", "postgres oracle database"),
(5, '[1,3,2]', "real world, how old are you", "redis oracle database"),
(6, '[2,1,1]', "hello world, where are you from", "starrocks oceanbase database");
Set the retrieval parameters.
SET @parm = '{
"query": {
"bool": {
"should": [
{"match": {"query": "hi hello"}},
{"match": { "content": "oceanbase mysql" }}
]
}
},
"knn" : {
"field": "vector",
"k": 5,
"query_vector": [1,2,3]
},
"rank": {
"rrf": {
"rank_window_size": 10,
"rank_constant": 60
}
},
"_source" : ["query", "content"]
}';
Execute the query and return the query result.
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
The result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"query": "hello world",
"_score": 0.0325,
"content": "oceanbase Elasticsearch database"
},
{
"query": "hello world, what is your name",
"_score": 0.0323,
"content": "oceanbase mysql database"
},
{
"query": "hello world, how are you",
"_score": 0.0315,
"content": "oceanbase oracle database"
},
{
"query": "real world, how old are you",
"_score": 0.0161,
"content": "redis oracle database"
},
{
"query": "hello world, where are you from",
"_score": 0.0159,
"content": "starrocks oceanbase database"
},
{
"query": "real world, where are you from",
"_score": 0.0156,
"content": "postgres oracle database"
}
] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set