This topic describes the hybrid search SQL interface of OceanBase Database. You can use this interface to combine full-text search, vector search, and filter conditions in a single SELECT statement by using the HYBRID_SEARCH keyword. OceanBase Database returns the search results sorted by relevance.
Hybrid search combines vector-based semantic search and full-text search. It provides more accurate and comprehensive search results by using a comprehensive ranking algorithm. Vector search is good at semantic approximate matching, but not good at exact matching of keywords, numbers, and proper nouns. Full-text search can effectively make up for this shortcoming. Therefore, hybrid search has become one of the key features of vector databases and is widely used in various products.
Limitations
HYBRID_SEARCHcan only be used on heap tables.- Vector search requires that an index is created on the target vector column. Currently, multi-vector search only supports dense vector columns.
- Full-text search requires that an index is created on the target text column. If the full-text column has a composite index, hybrid search is not supported.
- Scalar and JSON/ARRAY filters can be executed without an index, but it is recommended to create an index to achieve better performance.
- Only the HNSW index type is supported in the current version.
- Only row-based tables are supported.
- Generated columns are not supported.
This section only lists the limitations, which should be combined with the syntax restrictions of HYBRID_SEARCH at the end of the related document for a comprehensive understanding.
Syntax
SELECT select_list
FROM HYBRID_SEARCH(TABLE table_name, dsl_string);
Parameter description:
table_name: the name of the target table, which must be a heap table (ORGANIZATION = HEAP); supports partitioned and non-partitioned tables.dsl_string: a JSON string used to describe the query semantics, such as full-text search, vector search, filtering, and sorting.
For more information about the syntax, parameters, and limitations, see the related document at the end of the section HYBRID_SEARCH.
Create an example table and insert data
The same example tables are used in the following sections:
doc_table: an example table used for hybrid search that combines full-text search, vector search, and scalar filtering.
CREATE TABLE doc_table(
c1 INT,
vector VECTOR(3),
query VARCHAR(255),
content VARCHAR(255),
VECTOR INDEX idx_vec(vector) WITH (distance=l2, type=hnsw, lib=vsag),
FULLTEXT INDEX idx_query(query),
FULLTEXT INDEX idx_content(content)
) ORGANIZATION HEAP;
INSERT INTO doc_table VALUES
(1, '[1,2,3]', 'hello world', 'oceanbase Elasticsearch database'),
(2, '[1,2,1]', 'hello world, what is your name', 'oceanbase mysql database'),
(3, '[1,1,1]', 'hello world, how are you', 'oceanbase oracle database'),
(4, '[1,3,1]', 'real world, where are you from', 'postgres oracle database'),
(5, '[1,3,2]', 'real world, how old are you', 'redis oracle database'),
(6, '[2,1,1]', 'hello world, where are you from', 'starrocks oceanbase database');
products_multi_vector: an example table used for multi-vector search.
CREATE TABLE products_multi_vector (
product_id VARCHAR(50),
product_name VARCHAR(255),
description TEXT,
vec1 VECTOR(4),
vec2 VECTOR(4),
vec3 VECTOR(4),
VECTOR INDEX idx_vec1(vec1) WITH (distance=l2, type=hnsw, lib=vsag),
VECTOR INDEX idx_vec2(vec2) WITH (distance=l2, type=hnsw, lib=vsag),
VECTOR INDEX idx_vec3(vec3) WITH (distance=l2, type=hnsw, lib=vsag)
) ORGANIZATION HEAP;
INSERT INTO products_multi_vector VALUES
('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard', '[0.5,0.1,0.6,0.9]', '[0.2,0.3,0.4,0.5]', '[0.1,0.2,0.3,0.4]'),
('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset', '[0.1,0.9,0.2,0]', '[0.3,0.4,0.5,0.6]', '[0.2,0.3,0.4,0.5]'),
('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat', '[0.1,0.9,0.3,0]', '[0.4,0.5,0.6,0.7]', '[0.3,0.4,0.5,0.6]');
doc_json_array: an example table used for hybrid search with JSON and ARRAY filters.
CREATE TABLE doc_json_array (
id INT,
created_date date,
title varchar(255),
doc_json JSON,
tags_array1 ARRAY(VARCHAR(255)),
tags_array2 ARRAY(VARCHAR(255)),
INDEX idx_multivalue_tags((CAST(doc_json->'$.tags' AS CHAR(255) ARRAY))),
INDEX idx1(id),
INDEX idx2(title),
INDEX idx_created_date(created_date)
) ORGANIZATION HEAP;
INSERT INTO doc_json_array VALUES
(1, '2023-01-01', 'doc1', '{"name":"doc1","tags":["database","oceanbase"],"metadata":{"type":"test","score":40}}', ['database','oceanbase'], ['database','mysql']),
(2, '2023-01-02', 'doc2', '{"name":"doc2","tags":["database","mysql"],"metadata":{"type":"production","score":57}}', ['database','mysql'], ['database','mysql']),
(3, '2023-01-03', 'doc3', '{"name":"doc3","tags":["database","oracle"],"metadata":{"type":"test","score":19}}', ['database','oracle'], ['database','oracle']),
(4, '2023-01-04', 'doc4', '{"name":"doc4","tags":["database","postgres"],"metadata":{"type":"production","score":14}}', ['database','postgres'], ['database','postgres']),
(6, '2023-01-06', 'doc6', '{"name":"doc6","tags":["database","starrocks"],"metadata":{"type":"production","score":25}}', ['database','starrocks'], ['database','starrocks']),
(10, '2023-01-10', 'doc10', '{"name":"doc10","tags":["mobile","ios"],"metadata":{"type":"app","score":90}}', ['mobile','ios'], ['mobile','ios']);
-- Recommended: create search indexes to achieve optimal search performance
CREATE SEARCH INDEX idx_json ON doc_json_array(doc_json);
CREATE SEARCH INDEX idx_tags_array1 ON doc_json_array(tags_array1);
CREATE SEARCH INDEX idx_tags_array2 ON doc_json_array(tags_array2);
Here are some examples for different use cases, including a quick start and an extended example:
Quick start examples
This section provides six simple core examples covering common hybrid search scenarios, including vector search, full-text search, vector search + full-text search (RRF fusion), multi-vector search, filter condition merging, and score threshold filtering.
Vector search
This example searches for the 3 most similar records in the doc_table table to the vector [1,2,3] and returns the c1 column.
SELECT c1 FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"knn": {
"field": "vector",
"k": 3,
"query_vector": "[1,2,3]"
}
}'
);
Expected return result:
+------+
| c1 |
+------+
| 1 |
| 5 |
| 2 |
+------+
3 rows in set
Full-text search
This example searches for 4 records in the doc_table table where the content column contains oceanbase mysql and returns all columns.
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"match": {"content": "oceanbase mysql"}
}
}'
);
Expected return result:
+------+---------+---------------------------------+----------------------------------+--------------------+
| c1 | vector | query | content | __score |
+------+---------+---------------------------------+----------------------------------+--------------------+
| 2 | [1,2,1] | hello world, what is your name | oceanbase mysql database | 2.170969786679347 |
| 1 | [1,2,3] | hello world | oceanbase Elasticsearch database | 0.3503184713375797 |
| 3 | [1,1,1] | hello world, how are you | oceanbase oracle database | 0.3503184713375797 |
| 6 | [2,1,1] | hello world, where are you from | starrocks oceanbase database | 0.3503184713375797 |
+------+---------+---------------------------------+----------------------------------+--------------------+
4 rows in set
Full-text and vector RRF hybrid search
This example statement performs both full-text search (matching the keyword "oceanbase mysql") and vector search (searching for the 5 most similar records to the vector [1,2,3]), then merges the two sets of results using the RRF fusion algorithm. By default, it returns the 10 most relevant documents, with a total of 6 records meeting the criteria.
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"match": {"content": "oceanbase mysql"}
},
"knn": {
"field": "vector",
"k": 5,
"query_vector": "[1,2,3]"
},
"rank": {
"rrf": {
"rank_constant": 60,
"rank_window_size": 10
}
}
}'
);
Expected return result:
+------+---------+---------------------------------+----------------------------------+----------------------+
| c1 | vector | query | content | __score |
+------+---------+---------------------------------+----------------------------------+----------------------+
| 1 | [1,2,3] | hello world | oceanbase Elasticsearch database | 0.03252247488101534 |
| 2 | [1,2,1] | hello world, what is your name | oceanbase mysql database | 0.032266458495966696 |
| 3 | [1,1,1] | hello world, how are you | oceanbase oracle database | 0.031754032258064516 |
| 5 | [1,3,2] | real world, how old are you | redis oracle database | 0.016129032258064516 |
| 6 | [2,1,1] | hello world, where are you from | starrocks oceanbase database | 0.016129032258064516 |
| 4 | [1,3,1] | real world, where are you from | postgres oracle database | 0.015625 |
+------+---------+---------------------------------+----------------------------------+----------------------+
6 rows in set
Multi-vector search
This example statement independently performs vector searches on three vector fields (vec1, vec2, and vec3), each returning the 5 most similar results, then merges the results using the default weighted fusion algorithm, returning the document with the highest overall relevance.
SELECT * FROM HYBRID_SEARCH(
TABLE products_multi_vector,
'{
"knn": [
{"field":"vec1","k":5,"query_vector":"[0.5,0.1,0.6,0.9]"},
{"field":"vec2","k":5,"query_vector":"[0.2,0.3,0.4,0.5]"},
{"field":"vec3","k":5,"query_vector":"[0.1,0.2,0.3,0.4]"}
]
}'
);
Expected return result:
+------------+-------------------------------+----------------------------------+-------------------+-------------------+-------------------+--------------------+
| product_id | product_name | description | vec1 | vec2 | vec3 | __score |
+------------+-------------------------------+----------------------------------+-------------------+-------------------+-------------------+--------------------+
| prod-002 | Gamer-Pro Headset | High-fidelity gaming headset | [0.1,0.9,0.2,0] | [0.3,0.4,0.5,0.6] | [0.2,0.3,0.4,0.5] | 2.7134181710480463 |
| prod-003 | Eco-Friendly Yoga Mat | A non-slip yoga mat | [0.1,0.9,0.3,0] | [0.4,0.5,0.6,0.7] | [0.3,0.4,0.5,0.6] | 2.6366155630441215 |
| prod-001 | Gamer-Pro Mechanical Keyboard | A responsive mechanical keyboard | [0.5,0.1,0.6,0.9] | [0.2,0.3,0.4,0.5] | [0.1,0.2,0.3,0.4] | 2.6237901221768167 |
+------------+-------------------------------+----------------------------------+-------------------+-------------------+-------------------+--------------------+
3 rows in set
Filter condition merging
This example searches for 6 records in the doc_table table where the content column contains oceanbase mysql, with filter conditions that c1 is greater than or equal to 3 and less than or equal to 5, and returns the c1 column.
SELECT c1 FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"bool": {
"must": [{"match": {"content": "oceanbase mysql"}}],
"filter": [{"range": {"c1": {"gte": 3}}}]
}
},
"knn": {
"field": "vector",
"k": 10,
"query_vector": "[1,2,3]",
"filter": [
{"range": {"c1": {"lte": 5}}}
]
}
}'
);
Expected return result:
+------+
| c1 |
+------+
| 1 |
| 3 |
| 6 |
| 5 |
| 2 |
| 4 |
+------+
6 rows in set
Score threshold filtering (min_score)
Notice
The min_score parameter is supported only in the SQL interface and not in the PL interface.
This example searches for 2 records in the doc_table table where the content column contains oceanbase mysql, with a score threshold of 0.5, and returns all columns.
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {"match": {"content":{"query": "oceanbase mysql", "boost": 0.3}}},
"knn": {"field":"vector","k":5,"query_vector":"[1,2,3]", "boost": 0.7},
"min_score": 0.5
}'
);
Expected return result:
+------+---------+--------------------------------+----------------------------------+--------------------+
| c1 | vector | query | content | __score |
+------+---------+--------------------------------+----------------------------------+--------------------+
| 1 | [1,2,3] | hello world | oceanbase Elasticsearch database | 0.8050955414012738 |
| 2 | [1,2,1] | hello world, what is your name | oceanbase mysql database | 0.7912909360038041 |
+------+---------+--------------------------------+----------------------------------+--------------------+
2 rows in set
Advanced examples
This section provides more complex examples covering advanced hybrid search scenarios, including filtering with complex types (JSON/ARRAY), weighted hybrid search of vectors and full-text, WRRF hybrid search, normalization, and score queries in hybrid search.
Scalar + JSON/ARRAY filtering
These examples require the use of search indexes. In hybrid search scenarios, full-text tokenization and keyword retrieval are handled by the full-text index, while the search index is used for structured, semi-structured, or simple scalar conditions within complex types. These two types of indexes can coexist in the same table and work together with the vector index to improve query performance.
In analytical wide tables, the search index can also leverage the optimizer's multi-index join (Index Merge) capability to automatically identify and combine available full-text and scalar indexes, selecting the optimal scan path to further enhance query performance. For more information about search indexes, including usage, syntax, and complete examples, please refer to Search Index (Search Index).
Other hybrid search scenarios
Weighted hybrid search of vectors and full-text
Weighted hybrid search is the default hybrid search method in OceanBase AI, where the weight for full-text search is 0.3 and the weight for vector search is 0.7.
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"match": {"content": {"query": "oceanbase mysql", "boost": 0.3}}
},
"knn": {
"field": "vector",
"k": 5,
"query_vector": "[1,2,3]",
"boost": 0.7
}
}'
);
Expected return result:
+------+---------+---------------------------------+----------------------------------+---------------------+
| c1 | vector | query | content | __score |
+------+---------+---------------------------------+----------------------------------+---------------------+
| 1 | [1,2,3] | hello world | oceanbase Elasticsearch database | 0.8050955414012738 |
| 2 | [1,2,1] | hello world, what is your name | oceanbase mysql database | 0.7912909360038041 |
| 5 | [1,3,2] | real world, how old are you | redis oracle database | 0.2333333333333333 |
| 3 | [1,1,1] | hello world, how are you | oceanbase oracle database | 0.22176220806794056 |
| 4 | [1,3,1] | real world, where are you from | postgres oracle database | 0.11666666666666665 |
| 6 | [2,1,1] | hello world, where are you from | starrocks oceanbase database | 0.1050955414012739 |
+------+---------+---------------------------------+----------------------------------+---------------------+
6 rows in set
WRRF hybrid search of vectors and full-text
On the basis of RRF, boost weights can be set. In this example, the boost parameters in query and knn represent the weights for WRRF fusion.
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"match": {"content": {"query": "oceanbase mysql", "boost": 0.3}}
},
"knn": {
"field": "vector",
"k": 5,
"query_vector": "[1,2,3]",
"boost": 0.7
},
"rank": {
"rrf": {
"rank_constant": 60,
"rank_window_size": 10
}
}
}'
);
Expected return result:
+------+---------+---------------------------------+----------------------------------+----------------------+
| c1 | vector | query | content | __score |
+------+---------+---------------------------------+----------------------------------+----------------------+
| 1 | [1,2,3] | hello world | oceanbase Elasticsearch database | 0.03252247488101534 |
| 2 | [1,2,1] | hello world, what is your name | oceanbase mysql database | 0.032266458495966696 |
| 3 | [1,1,1] | hello world, how are you | oceanbase oracle database | 0.031754032258064516 |
| 5 | [1,3,2] | real world, how old are you | redis oracle database | 0.016129032258064516 |
| 6 | [2,1,1] | hello world, where are you from | starrocks oceanbase database | 0.016129032258064516 |
| 4 | [1,3,1] | real world, where are you from | postgres oracle database | 0.015625 |
+------+---------+---------------------------------+----------------------------------+----------------------+
6 rows in set
Normalization of vectors and full-text search
The normalizer parameter can be set to normalize the query results. In this example, normalizer is set to "minmax", indicating the use of Min-Max normalization.
SELECT * FROM HYBRID_SEARCH(
TABLE doc_table,
'{
"query": {
"match": {"content": {"query": "oceanbase mysql", "boost": 0.3}}
},
"knn": {
"field": "vector",
"k": 5,
"query_vector": "[1,2,3]",
"boost": 0.7
},
"rank": {
"weighted_sum": {
"normalizer": "minmax",
"rank_window_size": 10
}
}
}'
);
Expected return result:
+------+---------+---------------------------------+----------------------------------+---------------------+
| c1 | vector | query | content | __score |
+------+---------+---------------------------------+----------------------------------+---------------------+
| 1 | [1,2,3] | hello world | oceanbase Elasticsearch database | 0.7 |
| 2 | [1,2,1] | hello world, what is your name | oceanbase mysql database | 0.328 |
| 5 | [1,3,2] | real world, how old are you | redis oracle database | 0.13999999999999999 |
| 3 | [1,1,1] | hello world, how are you | oceanbase oracle database | 0 |
| 4 | [1,3,1] | real world, where are you from | postgres oracle database | 0 |
| 6 | [2,1,1] | hello world, where are you from | starrocks oceanbase database | 0 |
+------+---------+---------------------------------+----------------------------------+---------------------+
6 rows in set
View the execution plan
You can use EXPLAIN to view the execution plan of the hybrid search, including the subquery nodes, hybrid fusion method, and whether indexes are used:
EXPLAIN SELECT c1 FROM HYBRID_SEARCH(table doc_table, '{"query": { "bool": {
"must" : [{"match" : {"content": "oceanbase mysql"}}],
"filter": [
{"range": {"c1": {"gte" : 3}}},
{"range": {"c1": {"lte" : 10}}}
]
}},
"knn":
{
"field": "vector",
"k": 10,
"query_vector": "[1, 2, 3]"
}}');
Expected return result: The result shows the subquery nodes, the WRRF fusion method, and the use of the idx_vector vector index and the idx_content full-text index in the hybrid search plan.
+------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------+
| ================================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------------------------ |
| |0 |INDEX MERGE SCAN|doc_table(idx_vector)|1 |3 | |
| ================================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([doc_table.c1]), filter(nil), rowset=16 |
| access([doc_table.__pk_increment], [doc_table.c1]), partitions(p0) |
| is_index_back=true, is_global_index=false, use_index_merge=true, |
| fusion node: method=WEIGHT_SUM, limit(10), window_size(10) |
| vector node: index name=idx_vector |
| boolean node: |
| must: |
| match node: index name=idx_content |
| filter: |
| scalar node: filter([doc_table.c1 >= 3], [doc_table.c1 <= 10]) |
+------------------------------------------------------------------------------+
17 rows in set
References
- For more information about the syntax, parameters, and limitations, see HYBRID_SEARCH.
- Search index
- Full-text index
- Vector index
