Create a vector index|V4.3.3| docs|Distributed Database

Create a vector index

Last Updated：2025-11-27 06:23:27 Updated

This topic describes how to create, query, maintain, and drop a vector index in OceanBase Database.

Specifics of OceanBase vector indexes are as follows:

HNSW indexes are supported. The maximum dimension of an indexed column is 2000. HNSW indexes are memory-based indexes. They must be fully loaded into memory. They support DML operations and real-time queries.
L2 and inner product (IP) are supported as the index distance algorithm.

Prepare

Before you use vector indexes, you must enable the vector feature by setting the ob_vector_memory_limit_percentage parameter.

ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;

Overview

Vector index queries are approximate nearest neighbor queries, and they cannot guarantee 100% accuracy. The query accuracy is measured by recall. For example, if a query can return 9 correct results out of 10 nearest neighbors, the recall is 90%. The following table describes recall.

Recall is affected by the construction parameters and query parameters.
Query parameters are specified when you create an index and cannot be modified. However, you can specify the session variable ob_hnsw_ef_search to use a custom value.

The syntax for vector index queries is as follows:

SELECT ... FROM $table_name ORDER BY $distance_function($column_name, $vector_expr) [APPROXIMATE|APPROX] LIMIT $num [OFFSET $num];

The following table describes the syntax.

Query syntax requirements:
- You must specify the APPROXIMATE or APPROX keyword to choose the vector index path.
- You must include the ORDER BY and LIMIT clauses.
- The ORDER BY clause supports only a single vector condition.
- The value of LIMIT + OFFSET must be greater than 0 and no more than 16,384.
Distance function usage rules:
- If you do not specify the APPROXIMATE or APPROX keyword, the query plan will not choose the vector index path when you call inner_product, negative_inner_product, or L2. However, the system does not return an error in these cases.
- If you specify the APPROXIMATE or APPROX keyword, the query plan will not choose the vector index path when the distance function does not match the vector index algorithm. However, the system does not return an error in this case.
- If you specify the APPROXIMATE or APPROX keyword and the expression is inner_product, the query plan will not choose the vector index path.
- If you specify the APPROXIMATE or APPROX keyword and the expression is negative_inner_product or L2, the distance function must match the vector index algorithm. In this case, the query plan will choose the vector index path.
Other descriptions:
- The WHERE condition will be used as a filter condition after the vector index query.
- If you do not specify the LIMIT clause, the system returns an error.

Examples

Considerations

OceanBase supports creating vector indexes when you create a table and after you create a table. When you create a vector index, take note of the following considerations:

You cannot modify a vector index after it is created.
You must specify the VECTOR keyword when you create a vector index.
The parameters and descriptions of a post-created index are the same as those when you create an index.
If you have a large amount of data, we recommend that you write the data to the database first and then create an index to ensure optimal query performance.

Create an index when you create a table

Create a test table.

CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag),  VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));

Write test data.

INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

Perform an approximate nearest neighbor query.

SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;

The return result is as follows:

+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0   | c2                                                                                        | c3                                                                                         |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
|  3 |    3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set

The following table describes the parameters of the index.

Parameter	Default value	Value range	Required	Description	Remarks
distance		l2/inner_product	Yes	The type of the vector distance function.	l2 indicates the Euclidean distance, and inner_product indicates the inner product distance.
type		hnsw	Yes	The type of the index.	At present, only HNSW indexes are supported.
lib	vsag	vsag	No	The type of the vector index library.	At present, only the VSAG vector library is supported.
m	16	[5,64]	No	The maximum number of neighbors for each node in HNSW.	The value indicates the tradeoff between index construction time and query performance. A larger value indicates slower index construction but better query performance.
ef_construction	200	[5,1000]	No	The candidate set size during HNSW index construction.	The value indicates the tradeoff between index construction time and index quality. A larger value indicates slower index construction but higher index quality. ef_construction must be greater than m.
ef_search	64	[1,1000]	No	The candidate set size during HNSW query.	The value indicates the tradeoff between query speed and recall. A larger value indicates slower queries but higher recall.

Create an index after you create a table

Create a test table.

CREATE TABLE vec_table (id int, c2 vector(10));

Create an index.

CREATE VECTOR INDEX vec_idx1 ON vec_table(c2) WITH (distance=l2, type=hnsw);

View the schema of the test table.

SHOW CREATE TABLE vec_table;
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table     | Create Table                                                                                                                                                                                                                                                                                                                                                                                         |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table | CREATE TABLE `vec_table` (
  `id` int(11) DEFAULT NULL,
  `c2` VECTOR(10) DEFAULT NULL,
  VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

View the index.

SHOW INDEX FROM vec_table;
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table     | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment   | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table |          1 | vec_idx1 |            1 | c2          | A         |        NULL | NULL     | NULL   | YES  | VECTOR     | available |               | YES     | NULL       |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set

Drop an index

Drop an index.

DROP INDEX vec_idx1 ON vec_table;

View the index.

SHOW INDEX FROM vec_table;

The return result is as follows:

Empty set

Index maintenance

When too much incremental data is generated, the query performance decreases. To reduce the volume of incremental data tables, OceanBase Database introduces the DBMS_VECTOR package for vector index maintenance.

Incremental refresh

If a large amount of data is written after the index is created, we recommend that you use incremental refresh.

The syntax is as follows:

  PROCEDURE refresh_index(
    IN     idx_name               VARCHAR(65535),                 ---- The index name.
    IN     table_name             VARCHAR(65535),                 ---- The table name.
    IN     idx_vector_col         VARCHAR(65535) DEFAULT NULL,    ---- The vector column name.
    IN     refresh_threshold      INT DEFAULT 10000,              ---- The threshold of incremental data.
    IN     refresh_type           VARCHAR(65535) DEFAULT NULL);   ---- The refresh type. Valid value: FAST.

Incremental refresh is performed every 15 minutes. If more than 10,000 incremental data items are generated, incremental refresh is automatically triggered.

Full refresh

If a large amount of data is updated or deleted after the index is created, we recommend that you use full refresh.

The syntax is as follows:

  PROCEDURE rebuild_index(
    IN     idx_name                VARCHAR(65535),                    ---- The index name.
    IN     table_name              VARCHAR(65535),                    ---- The table name.
    IN     idx_vector_col          VARCHAR(65535) DEFAULT NULL,       ---- The vector column name.
    IN     delta_rate_threshold    FLOAT DEFAULT 0.2,                 ---- The threshold of incremental data.
    IN     idx_organization        VARCHAR(65535) DEFAULT NULL,       ---- The index type.
    IN     idx_distance_metrics    VARCHAR(65535) DEFAULT 'EUCLIDEAN',---- The distance type.
    IN     idx_parameters          LONGTEXT DEFAULT NULL,             ---- The index parameters.
    IN     idx_parallel_creation   INT DEFAULT 1);                    ---- The degree of parallelism for parallel index creation. Reserved. Supported only in syntax.

Full refresh is performed every 24 hours. If the amount of new data exceeds 20% of the original data amount, full refresh is automatically triggered. Full refresh is performed asynchronously in the background. It creates a new index and then replaces the old index. During the rebuild, the old index remains available, but the overall process is relatively slow.

Examples of index maintenance

Create a test table.

CREATE TABLE vector_index_test(c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag));

Write test data.

INSERT INTO vector_index_test VALUES(1, '[0.203846,0.205289,0.880265]');
INSERT INTO vector_index_test VALUES(2, '[0.484526,0.669954,0.986755]');
INSERT INTO vector_index_test VALUES(3, '[0.327936,0.048756,0.084670]');
INSERT INTO vector_index_test VALUES(4, '[0.148869,0.878546,0.028024]');
INSERT INTO vector_index_test VALUES(5, '[0.334970,0.857377,0.886132]');
INSERT INTO vector_index_test VALUES(6, '[0.117582,0.302352,0.471198]');
INSERT INTO vector_index_test VALUES(7, '[0.551185,0.231134,0.075354]');
INSERT INTO vector_index_test VALUES(8, '[0.185221,0.315131,0.558301]');
INSERT INTO vector_index_test VALUES(9, '[0.928764,0.254038,0.272721]');

Trigger an update.

-- Trigger incremental update.
call dbms_vector.refresh_index('idx1', 'vector_index_test', 'c2', 1, 'FAST');
-- Trigger full update.
call dbms_vector.rebuild_index('idx1','t1','c2');

References

Vector functions