OceanBase Database supports vector search using SQL. This topic provides a quick start guide.
Prerequisites
You have deployed an OceanBase cluster and created a MySQL tenant.
You have connected to OceanBase Database. For more information, see Overview of connection methods.
Get started
Enable vector search
Before using vector indexes, you must estimate the memory resources required for index data in the tenant and configure the relevant parameters. The following example shows how to configure the memory resources for vector indexes to be 30% of the memory resources available to the tenant:
ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;
The default value of ob_vector_memory_limit_percentage is 0, which means that no memory resources are allocated to vector indexes. In this case, an error will be returned when you create an index.
Create a vector column and an index
You can declare a column as a vector column and specify the column dimension by using the VECTOR(dim) data type when you create a table. A vector index must be created on a vector column, and you must specify the type and distance parameters.
The following example creates a vector column named embedding with a dimension of 3, and creates an HNSW index on the embedding column. The L2 distance algorithm is used.
CREATE TABLE t1(
id INT PRIMARY KEY,
doc VARCHAR(200),
embedding VECTOR(3),
VECTOR INDEX idx1(embedding) WITH (distance=L2, type=hnsw)
);
We recommend that you create vector indexes after you import all data, especially when a large amount of data is involved. For more information, see Create vector indexes.
Write vector data
To simulate a vector search scenario, you need to construct some vector data. Each data record contains a description of the data and the corresponding vector. In the following example, the vector corresponding to 'apple' is '[1.2,0.7,1.1]', and the vector corresponding to 'carrot' is '[5.3,4.8,5.4]'.
INSERT INTO t1
VALUES (1, 'apple', '[1.2,0.7,1.1]'),
(2, 'banana', '[0.6,1.2,0.8]'),
(3, 'orange','[1.1,1.1,0.9]'),
(4, 'carrot', '[5.3,4.8,5.4]'),
(5, 'spinach', '[4.9,5.3,4.8]'),
(6, 'tomato','[5.2,4.9,5.1]');
In this example, the vectors are 3-dimensional vectors that are manually generated for simplicity. In practice, you need to use an embedding model to generate vectors for the text. The vectors can have hundreds or even thousands of dimensions.
You can query data in the table to check whether the data is written successfully.
SELECT * FROM t1;
The expected result is as follows:
+----+-----------+---------------+
| id | doc | embedding |
+----+-----------+---------------+
| 1 | apple | [1.2,0.7,1.1] |
| 2 | banana | [0.6,1.2,0.8] |
| 3 | orange | [1.1,1.1,0.9] |
| 4 | carrot | [5.3,4.8,5.4] |
| 5 | spinach | [4.9,5.3,4.8] |
| 6 | tomato | [5.2,4.9,5.1] |
+----+-----------+---------------+
6 rows in set
Perform vector search
To perform vector search, you must provide a vector as the search condition. In this example, you want to find all 'fruits'. The vector corresponding to 'fruits' is [0.9, 1.0, 0.9]. The corresponding SQL statement is as follows:
SELECT id, doc FROM t1
ORDER BY l2_distance(embedding, '[0.9, 1.0, 0.9]')
APPROXIMATE LIMIT 3;
The expected result is as follows:
+----+--------+
| id | doc |
+----+--------+
| 3 | orange |
| 2 | banana |
| 1 | apple |
+----+--------+
3 rows in set
References
For more information about vector data, see Vector data.
For information about how to create a vector index and drop an index after a table is created, see Create a vector index.