OceanBase Database allows you to perform a vector search by using an SQL statement. This topic describes how to do so.
Prerequisites
You have deployed an OceanBase cluster and created a MySQL-compatible tenant. For more information, see Deploy an OceanBase cluster.
You have connected to OceanBase Database. For more information, see Overview of connection methods.
Quick start
Enable the vector search feature
You can set the ob_vector_memory_limit_percentage parameter to enable the vector search feature:
Before V4.4.1, you must manually set the
ob_vector_memory_limit_percentageparameter to enable the vector search feature. We recommend that you set the parameter to30to achieve optimal search performance. If you do not set the parameter, no memory is allocated for vector indexes, and an error is returned when you create an index. The IVF/IVF_SQ/IVF_PQ indexes do not need to be kept in memory, so you do not need to consider this parameter. Example:ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;Starting from V4.4.1, the vector search feature is enabled by default. The default value of
0indicates adaptive mode, where the system automatically adjusts the memory usage ratio of vector indexes in the tenant without manual intervention:- If the tenant memory is 8 GB or less, the value is automatically adjusted to
40. - If the tenant memory is more than 8 GB, the value is automatically adjusted to
50.
- If the tenant memory is 8 GB or less, the value is automatically adjusted to
Create a vector column and an index
When you create a table, you can use the VECTOR(dim) data type to declare a column as a vector column and specify its dimension. You must create a vector index on the vector column and specify at least the type and distance parameters.
In the following example, a vector column named embedding is created with a dimension of 3, and an HNSW index is created on the embedding column. The L2 algorithm is specified as the distance algorithm.
CREATE TABLE t1(
id INT PRIMARY KEY,
doc VARCHAR(200),
embedding VECTOR(3),
VECTOR INDEX idx1(embedding) WITH (distance=L2, type=hnsw)
);
We recommend that you import data before you create a vector index. For more information, see Vector search indexes.
Write vector data
To simulate a vector search scenario, construct some vector data. Each row of data includes a description and the corresponding vector. In the example, the vector for 'apple' is assumed to be '[1.2,0.7,1.1]', and the vector for 'carrot' is assumed to be '[5.3,4.8,5.4]'.
INSERT INTO t1
VALUES (1, 'apple', '[1.2,0.7,1.1]'),
(2, 'banana', '[0.6,1.2,0.8]'),
(3, 'orange','[1.1,1.1,0.9]'),
(4, 'carrot', '[5.3,4.8,5.4]'),
(5, 'spinach', '[4.9,5.3,4.8]'),
(6, 'tomato','[5.2,4.9,5.1]');
For the sake of simplicity, this example uses only 3-dimensional vectors, which are manually generated. In real applications, you need to use an embedding model to generate vectors for real text, and the dimensions can be hundreds or even thousands.
You can query the table to verify whether the data is written successfully.
SELECT * FROM t1;
The expected output is as follows:
+----+-----------+---------------+
| id | doc | embedding |
+----+-----------+---------------+
| 1 | apple | [1.2,0.7,1.1] |
| 2 | banana | [0.6,1.2,0.8] |
| 3 | orange | [1.1,1.1,0.9] |
| 4 | carrot | [5.3,4.8,5.4] |
| 5 | spinach | [4.9,5.3,4.8] |
| 6 | tomato | [5.2,4.9,5.1] |
+----+-----------+---------------+
6 rows in set
Perform vector search
To perform vector search, provide a vector as the search condition. For example, to find all 'fruits' whose vector is [0.9, 1.0, 0.9], the corresponding SQL statement is as follows:
SELECT id, doc FROM t1
ORDER BY l2_distance(embedding, '[0.9, 1.0, 0.9]')
APPROXIMATE LIMIT 3;
The expected output is as follows:
+----+--------+
| id | doc |
+----+--------+
| 3 | orange |
| 2 | banana |
| 1 | apple |
+----+--------+
3 rows in set
References
For more information about vector data, see Vector data.
For information about how to create or drop a vector index after you create a table, see Dense vector indexes.
