OceanBase supports vector search using SQL. This topic walks you through the basics.
Prerequisites
- An OceanBase cluster is deployed and a MySQL-compatible tenant is created. See Deploy OceanBase.
- You are connected to OceanBase. See Connection methods.
Quick start
Enable vector search
OceanBase vector search uses ob_vector_memory_limit_percentage to control vector index memory:
Before V4.4.1, you must set
ob_vector_memory_limit_percentagebefore using HNSW, HNSW_SQ, or HNSW_BQ indexes. We recommend30for best search performance. If you leave the default, no memory is allocated for vector indexes and index creation fails. IVF and IVF_PQ indexes do not require resident memory, so you can ignore this parameter for them. Example:ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;From V4.4.1, vector search is on by default. The default
0means adaptive mode: the system sets the memory ratio for vector index data in the tenant automatically, so you do not need to set it:- If tenant memory is 8 GB or less, the effective value is
40. - If tenant memory is greater than 8 GB, the effective value is
50.
- If tenant memory is 8 GB or less, the effective value is
Create a vector column and index
You can declare a column as a vector with VECTOR(dim) when creating a table. A vector index must be created on a vector column; you must specify at least type and distance.
This example creates an embedding column of dimension 3 and an HNSW index on it with L2 distance:
CREATE TABLE t1(
id INT PRIMARY KEY,
doc VARCHAR(200),
embedding VECTOR(3),
VECTOR INDEX idx1(embedding) WITH (distance=L2, type=hnsw)
);
For large tables, load data first and then create the vector index.
Write vector data
To demonstrate vector search, add some rows: each row has a description and its vector. Here, 'apple' maps to '[1.2,0.7,1.1]', 'carrot' to '[5.3,4.8,5.4]', and so on.
INSERT INTO t1
VALUES (1, 'apple', '[1.2,0.7,1.1]'),
(2, 'banana', '[0.6,1.2,0.8]'),
(3, 'orange','[1.1,1.1,0.9]'),
(4, 'carrot', '[5.3,4.8,5.4]'),
(5, 'spinach', '[4.9,5.3,4.8]'),
(6, 'tomato','[5.2,4.9,5.1]');
This example uses 3-dimensional, hand-picked vectors for clarity. In production you would use an embedding model to generate vectors from real text; dimensions are typically hundreds or thousands.
Check that the data was written:
SELECT * FROM t1;
Example result:
+----+-----------+---------------+
| id | doc | embedding |
+----+-----------+---------------+
| 1 | apple | [1.2,0.7,1.1] |
| 2 | banana | [0.6,1.2,0.8] |
| 3 | orange | [1.1,1.1,0.9] |
| 4 | carrot | [5.3,4.8,5.4] |
| 5 | spinach | [4.9,5.3,4.8] |
| 6 | tomato | [5.2,4.9,5.1] |
+----+-----------+---------------+
6 rows in set
Run a vector search
Provide a vector as the search condition. To find the three rows closest to "fruit" (represented by the vector [0.9, 1.0, 0.9]), run:
SELECT id, doc FROM t1
ORDER BY l2_distance(embedding, '[0.9, 1.0, 0.9]')
APPROXIMATE LIMIT 3;
Example result:
+----+--------+
| id | doc |
+----+--------+
| 3 | orange |
| 2 | banana |
| 1 | apple |
+----+--------+
3 rows in set
Related topics
- Vector data type: detailed description of vector data.
- Vector index overview: creating and using vector indexes.