OceanBase Database allows you to perform vector search by using SQL statements. This topic describes how to do so.
Prerequisites
You have deployed an OceanBase cluster and created a MySQL-compatible tenant. For more information, see Deploy an OceanBase cluster.
You have connected to OceanBase Database. For more information, see Connection methods.
Procedure
Enable the vector search feature
You can set the ob_vector_memory_limit_percentage parameter to enable the vector search feature:
Before V4.3.5 BP3, you must manually set the
ob_vector_memory_limit_percentageparameter to enable the vector search feature. We recommend that you set this parameter to30for the best search performance. If you do not set this parameter, no memory is allocated for vector indexes, and an error will be returned when you create an index. Example:ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;From V4.3.5 BP3, the vector search feature is enabled by default. The default value
0indicates adaptive mode, in which the system automatically adjusts the memory usage ratio of vector indexes in the tenant without manual adjustment:- If the actual memory of the tenant is 8 GB or less, the value is automatically adjusted to
40. - If the actual memory of the tenant is more than 8 GB, the value is automatically adjusted to
50.
- If the actual memory of the tenant is 8 GB or less, the value is automatically adjusted to
Create a vector column and an index
When you create a table, you can use the VECTOR(dim) data type to declare a column as a vector column and specify its dimension. You must create a vector index on a vector column and specify at least the type and distance parameters.
The following example creates a vector column named embedding with a dimension of 3 and creates an HNSW index on the embedding column. The distance parameter specifies the L2 distance algorithm.
CREATE TABLE t1(
id INT PRIMARY KEY,
doc VARCHAR(200),
embedding VECTOR(3),
VECTOR INDEX idx1(embedding) WITH (distance=L2, type=hnsw)
);
We recommend that you import data before you create a vector index for a large amount of data. For more information, see Vector search index.
Write vector data
To simulate a vector search scenario, construct some vector data. Each row of data includes a description and a corresponding vector. For example, the vector of 'apple' is '[1.2,0.7,1.1]', and the vector of 'carrot' is '[5.3,4.8,5.4]'.
INSERT INTO t1
VALUES (1, 'apple', '[1.2,0.7,1.1]'),
(2, 'banana', '[0.6,1.2,0.8]'),
(3, 'orange','[1.1,1.1,0.9]'),
(4, 'carrot', '[5.3,4.8,5.4]'),
(5, 'spinach', '[4.9,5.3,4.8]'),
(6, 'tomato','[5.2,4.9,5.1]');
For the sake of simplicity, this example uses 3-dimensional vectors, which are manually generated. In real applications, you need to generate vectors of hundreds or thousands of dimensions based on real text using an embedding model.
You can query the table to verify whether the data is written.
SELECT * FROM t1;
The expected result is as follows:
+----+-----------+---------------+
| id | doc | embedding |
+----+-----------+---------------+
| 1 | apple | [1.2,0.7,1.1] |
| 2 | banana | [0.6,1.2,0.8] |
| 3 | orange | [1.1,1.1,0.9] |
| 4 | carrot | [5.3,4.8,5.4] |
| 5 | spinach | [4.9,5.3,4.8] |
| 6 | tomato | [5.2,4.9,5.1] |
+----+-----------+---------------+
6 rows in set
Perform vector search
To perform vector search, provide a vector as the search condition. For example, if you want to find all 'fruits' whose corresponding vector is [0.9, 1.0, 0.9], you can execute the following SQL statement:
SELECT id, doc FROM t1
ORDER BY l2_distance(embedding, '[0.9, 1.0, 0.9]')
APPROXIMATE LIMIT 3;
The expected result is as follows:
+----+--------+
| id | doc |
+----+--------+
| 3 | orange |
| 2 | banana |
| 1 | apple |
+----+--------+
3 rows in set
References
For more information about vector data, see Vector data.
For more information about how to create and drop a vector index after you create a table, see Vector index.