Create a vector index|V4.3.5| docs|Distributed Database

Create a vector index

Last Updated：2025-12-09 06:29:15 Updated

This topic describes how to create, search, maintain, and delete vector indexes in OceanBase Database.

Notice

IVF/IVF_PQ indexes are supported starting from OceanBase Database V4.3.5 BP1.

HNSW_BQ indexes are supported starting from OceanBase Database V4.3.5 BP2.

Index types

OceanBase Database supports the following vector index types:

Index type	Description	Scenario
HNSW	The maximum dimension of an index column is 4096. An HNSW index is an in-memory index that needs to be fully loaded into memory.
HNSW_SQ	An HNSW_SQ index provides similar index construction speed, search performance, and recall rate as an HNSW index, but reduces the total memory usage to 1/2 to 1/3 of that of an HNSW index.	Scenarios with high requirements on performance and recall rate.
HNSW_BQ	An HNSW_BQ index has a slightly lower recall rate than an HNSW index, but significantly reduces memory usage. The Rabitq algorithm can compress vectors to 1/32 of their original size. As the dimension of a vector increases, the memory optimization effect of an HNSW_BQ index becomes more significant.
IVF	An IVF index based on a database table that does not occupy resident memory.	Scenarios with low performance requirements, large data volumes, and cost sensitivity.
IVF_PQ	An IVF_PQ index based on a database table that does not occupy resident memory. The PQ quantization algorithm is applied to the IVF index. The recall rate of an IVF_PQ index is slightly lower than that of an IVF index, but its performance is higher. In general scenarios, the PQ quantization algorithm can compress vectors to 1/16 to 1/32 of their original size.	Scenarios with low performance requirements, large data volumes, and cost sensitivity.

Other notes:

Dense vector indexes support the L2, inner product (IP), and cosine distance algorithms.
You can call some distance functions during vector index search. For more information, see Use SQL functions.
You can specify filter conditions for vector search. The filter conditions can be scalar conditions or spatial relationships such as ST_Intersects. Multi-value indexes, full-text indexes, and global indexes cannot be used as pre-filterers.
You can create a vector index and a full-text index in the same table.
For information about whether vector indexes support offline DDL operations, see Offline DDL.
Weak read (Weak Read) is supported for vector index search starting from V4.3.5 BP4. Weak read allows you to read data from a replica in the local IDC instead of accessing the primary database across regions.

Limitations:

In OceanBase Database V4.3.5 BP3, you cannot create a columnstore vector index.

Configure vector index memory

OceanBase Database allows you to configure the memory for vector indexes by setting the ob_vector_memory_limit_percentage parameter:

Before you use the HNSW/HNSW_SQ/HNSW_BQ vector index in versions earlier than V4.3.5 BP3, you must manually set the ob_vector_memory_limit_percentage parameter to enable the vector search feature. We recommend that you set the parameter to 30 for optimal search performance. If you do not set the parameter to a value other than the default value, no memory is allocated for vector indexes, and an error is returned when you create an index. The IVF/IVF_PQ vector indexes do not need to be resident in memory, so you do not need to consider this parameter. Example:
```
ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;
```
Starting from V4.3.5 BP3, the vector search feature is enabled by default. The default value of 0 indicates adaptive mode, where the system automatically adjusts the memory usage ratio for vector indexes in the tenant without manual intervention:
- If the actual memory of the tenant is 8 GB or less, the value is automatically adjusted to 40.
- If the actual memory of the tenant exceeds 8 GB, the value is automatically adjusted to 50.

Query the estimated and actual memory usage of indexes

You can use the DBMS_VECTOR package to estimate the memory usage of indexes:

Before you create a table, you can call the INDEX_VECTOR_MEMORY_ADVISOR procedure to estimate the memory usage of indexes.
After you create a table and insert data into the table, you can call the INDEX_VECTOR_MEMORY_ESTIMATE procedure to analyze the memory usage of indexes.

The memory usage of vector indexes includes the minimum memory configuration required to create vector indexes and the actual memory usage after the HNSW_SQ and IVF indexes are created.

You can set the load_vector_index_on_follower parameter to specify whether to automatically load the memory vector index on a follower node. For more information, see load_vector_index_on_follower. If weak-consistency reads are not required, you can disable this parameter to reduce the memory usage of vector indexes.

You can query the GV$OB_VECTOR_MEMORY or V$OB_VECTOR_MEMORY view to obtain the detailed information about the actual memory usage.

Syntax and description

OceanBase supports two ways to create vector indexes: during table creation and after table creation. Please keep the following in mind:

The VECTOR keyword is required when creating a vector index.
The parameters and instructions for creating an index after table creation are the same as those for creating one during table creation.
If you have a large amount of data, it’s recommended to insert all the data first and then create the index to achieve optimal search performance.
For HNSW_SQ, IVF, and IVF_PQ indexes, it’s best to create the index after data insertion and to rebuild the index after adding a significant amount of new data. Detailed instructions for creating each type of index can be found in the examples below.

HNSW/HNSW_SQ/HNSW_BQ

IVF

Syntax for creating an index when you create a table:

CREATE TABLE table_name (
    column_name1 data_type1,
    column_name2 VECTOR(dimension), -- A vector column must be of the VECTOR type. The dimension parameter specifies the vector dimension.
    ...,
    VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
);

Syntax for creating an index later:

-- You can set the parallelism to improve the index construction performance. The maximum value of the parallelism cannot exceed twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);

The following table describes the param parameters.

Parameter	Default value	Value range	Required	Description	Remarks
distance		l2/inner_product/cosine	Yes	The type of vector distance algorithm.	l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance.
type		`hnsw` / `hnsw_sq`/ `hnsw_bq`	Yes	The type of the index.
lib	vsag	vsag	No	The type of vector index library.	Only the VSAG vector library is supported.
m	16	[5,128]	No	The maximum number of neighbors per node.	The larger the value, the slower the index is built, but the better the search performance.
ef_construction	200	[5,1000]	No	The size of the candidate set during index building.	A larger value indicates a slower index building process but a better index quality. `ef_construction` must be greater than `m`.
ef_search	64	[1,1000]	No	The size of the candidate set during search.	The higher the value, the slower the search, and the higher the recall.
extra_info_max_size	0	[0,16384]	No	Sets the maximum size (in bytes) of each key's information. The primary keys of a table are stored in indexes to improve search efficiency.	`0`: The key information is not stored. `1`: The key information is forcibly stored and the size limit is ignored. In this case, the primary key of a table must be supported. `Greater than 1`: The maximum size of key information (in bytes) is set. In this case, the following conditions must be met: The key information size (calculated as described below) must be smaller than the set limit. The primary key of the table must be supported.
refine_k	4.0	[1.0,1000.0]	No	Notice This parameter is supported in V4.3.5 BP3 and later. This parameter is applicable only when you create an HNSW_BQ index. This float parameter specifies the percentage of the reordering for the index of a quantized vector.	You can set this parameter when you create an index. You can also set this parameter when you perform a search. When you perform a search, if you do not set this parameter, the value set when you create an index is used. When you perform a search, if you set this parameter, the value set when you perform a search is used.
refine_type	sq8 Notice If the cluster is upgraded from an earlier version to V4.3.5 BP3, the default value of this parameter is fp32.	sq8/fp32	No	Notice This parameter is supported in V4.3.5 BP3 and later. You can specify this parameter only when you create an HNSW_BQ index. The precision of the vector index.	This parameter improves the efficiency by reducing the memory overhead and index building time during index building, but may affect the recall rate.
bq_bits_query	32	0/4/32	No	Notice This parameter is supported in V4.3.5 BP3 and later versions. It can be set only when an HNSW_BQ index is created. Specifies the search precision of the quantization vector index, in bits.	This value reduces the memory overhead and index-building time during index building, but may affect the recall rate.
bq_use_fht	true Notice If the cluster is upgraded from a previous version to V4.3.5 BP3, the default value is false.	true/false	No	Notice This parameter is supported in V4.3.5 BP3 and later. It can be specified only when an HNSW_BQ index is created. Indicates whether to use fast Hadamard transform (FHT) for a search. FHT is an algorithm that is used to accelerate the computation of vector inner products.

The following primary key types are supported for the extra_info_max_size parameter:

Numeric types: integer, float, and Bit_value types.
Date and time types
Character types: VARCHAR type.

The size of the primary key information is calculated as follows:

SET @table_name = 'test'; -- Replace table_name with the name of the table to be queried.

SELECT
    CASE
        WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
        ELSE COALESCE(SUM(result_value), 'not support')
    END AS extra_info_size
FROM (
    SELECT
        CASE
            WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- For numeric types, extra_info_size += 8
            WHEN oc.data_type = 22 THEN oc.data_length -- For varchar types, add data_length to extra_info_size.
            ELSE NULL -- unsupported types
        END AS result_value
    FROM
        oceanbase.__all_column oc
    JOIN
        oceanbase.__all_virtual_data_type vdt
    ON
        oc.data_type = vdt.data_type
    WHERE
        oc.rowkey_position != 0
        AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;

-- Result is 8 bytes.

The syntax for creating an index at the table creation stage is as follows:

CREATE TABLE table_name (
    column_name1 data_type1,
    column_name2 data_type2,
    ...,
    VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
);

The syntax for creating an index on an existing table is as follows:

-- If you specify a parallelism value for a composite index after you create the index, the system boosts the performance of index construction. The maximum parallelism value is 2 times the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);

Description of the param parameter:

Parameter	Default value	Value range	Whether required	Description	Remarks
distance		l2/inner_product/cosine	Yes	Specifies the type of vector distance algorithm.	l2 specifies the Euclidean distance, inner_product specifies the inner product distance, and cosine specifies the cosine distance.
type		ivf_flat/ivf_pq	Yes	The IVF index type.
lib	ob	ob	No	Specifies the vector index library type.
nlist	128	[1,65536]	No	The number of cluster centers.	In V4.3.5 BP5: If the data volume is less than 1 million, the recommended value is `data_volume/1000`. If the data volume is greater than 1 million, the recommended value is `sqrt(data_volume)`. For multi-partition tables, estimate `data_volume` based on the data volume of the largest partition.
sample_per_nlist	256	[1, int64_max]	Yes	Number of samples taken for each cluster center. This parameter is used when you build an index.	In V4.3.5 BP5, it is recommended to keep the default value and not modify it.
nbits	8	[1,24]	No	Specifies the quantization bit length. Notice This parameter is supported in V4.3.5 BP3 and later. It can be specified only when you create an IVF_PQ index.	We recommend that you set the value to 8 and set the value to a value between 8 and 10. A larger value provides higher quantization accuracy and higher search accuracy, but affects the search performance.
m	No default value. You must specify a value.	[1,65536]	Yes	The dimension of the quantized vectors. Notice This parameter is supported in V4.3.5 BP3 and later. It can be specified only when you create an IVF_PQ index.	The larger the value, the slower the index is built but the higher the search accuracy. However, the search performance is affected.

For IVF/IVF_PQ indexes (only applicable to V4.3.5 BP5), the following additional notes apply:

Index limitations:
- IVF/IVF_PQ indexes cannot be created when heap tables and partitioned tables are used together.
Build recommendations:
- It is strongly recommended to create IVF/IVF_PQ indexes after data import is complete. It is not recommended to create IVF indexes synchronously when creating a table. If there is no data when creating the index, the index cannot perform clustering and will be unusable. In this case, you need to manually execute REBUILD INDEX or DROP and then CREATE INDEX again.
- Before creating an index, it is recommended to refer to the Query the estimated and actual memory usage of indexes section of this topic to estimate and query memory usage to avoid memory shortage during creation.
- Before creating an index, it is recommended to execute a major compaction ALTER SYSTEM major freeze and wait for it to complete.
- In multi-node cluster mode, it is recommended to set the SYS tenant memory to greater than 1 GB.
- Before creating an index, it is recommended to set the parallel_servers_target parameter value to max_cpu * 10 of the tenant.
- For scenarios where the data volume of a single partition exceeds 10 million and the tenant CPU resources are higher than 10 cores, it is recommended to modify the sampling ratio by ALTER SYSTEM SET _px_object_sampling = 5000 before creating an index to improve index construction efficiency.

Search syntax and description

Vector index search is an approximate nearest neighbor search and does not guarantee 100% accuracy. The accuracy of vector search is measured by the recall rate. For example, if 9 out of 10 nearest neighbors can be returned stably, the recall rate is 90%. The recall rate is described as follows:

The recall rate is affected by the build parameters and search parameters.
Index search parameters are specified when creating the index and cannot be modified afterward. However, you can set them using session variables: ob_hnsw_ef_search for HNSW/HNSW_SQ/HNSW_BQ indexes and ob_ivf_nprobes for IVF indexes. If session variables are set, their values will be prioritized. For more information, see ob_hnsw_ef_search and ob_ivf_nprobes.

The search syntax for dense vector indexes is as follows:

SELECT ... FROM $table_name ORDER BY $distance_function($column_name, $vector_expr) [APPROXIMATE|APPROX] LIMIT $num (OFFSET $num) [PARAMETERS ($param1=$value1, ...)];

The search usage is described as follows:

Search syntax requirements:
- You must specify the APPROXIMATE/APPROX keyword to use vector indexes instead of full table scans.
- The ORDER BY and LIMIT clauses must be included.
- The ORDER BY clause only supports a single vector condition.
- The value of LIMIT + OFFSET must be in the range (0, 16384].
- The PARAMETERS clause is used to specify search parameters. For IVF/IVF_PQ indexes, the nprobes parameter is supported, for example, PARAMETERS(nprobes=200). This parameter is supported starting from V4.3.5 BP5 and is used to specify the number of cluster centers during search.
Distance function usage rules:
- If you specify APPROXIMATE/APPROX, the current version supports the distance function, and the distance function matches the vector index algorithm, the search will use the vector index.
- If you specify APPROXIMATE/APPROX, the current version supports the distance function, but the distance function does not match the vector index algorithm, the search will not use the vector index, but no error will be returned.
- If you specify APPROXIMATE/APPROX, the current version does not support the distance function, the search will not use the vector index, and an error will be returned.
- If you do not specify APPROXIMATE/APPROX, the current version supports the distance function, the search will not use the vector index, but no error will be returned.

Notice

Starting from V4.3.5 BP5, you can use the PARAMETERS(similarity=$value) clause to specify a similarity threshold for search.

Similarity threshold parameter (similarity):
- Syntax: Add PARAMETERS (similarity=$value) after the LIMIT clause, where $value is the similarity threshold (range: [0, 1]). Generally, a larger similarity value indicates a closer match, which corresponds to a smaller distance. This is an exception for the inner_product distance function, where a larger distance indicates a closer match.
- Function: After specifying the similarity threshold, only results with a similarity value greater than or equal to the threshold are returned.
- Supported index types: HNSW, HNSW_SQ, HNSW_BQ, IVF, and IVF_PQ.
- Supported distance types:
  - During queries, you can specify the similarity threshold for indexes of the cosine_distance and l2_distance types. For l2_distance, it is recommended to perform L2 normalization on the vectors to improve the accuracy of similarity search. For more information and examples, see Vector normalization.
  - You cannot specify the similarity threshold for indexes of the inner_product type during queries. This will return an error not support.
- Correspondence between similarity and distance:
  - cosine_distance=2 - 2 * similarity
  - l2_distance=sqrt(1 / similarity -1)
Other considerations:
- The WHERE condition will be used as a filter after vector index search.
- If the LIMIT clause is not specified, an error will be returned.
IVF/IVF_PQ index search optimization recommendations (only applicable to V4.3.5 BP5):
- By setting ROW_FORMAT=COMPRESSED when creating the main table to enable compressed row format, you can improve IVF vector index search performance by 20~30%, for example:
```
CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 VECTOR(4)) ROW_FORMAT=COMPRESSED;
```
- It is not recommended to enable parallel search, as it may affect search performance or result in incorrect results.

Create, search, and delete examples

Create indexes during table creation

Dense vector index examples

HNSW example

Notice

When you create an HNSW index, the index name must be no longer than 25 characters. Otherwise, an exception may be returned because the index auxiliary table name exceeds the index_name limit. Future versions will support longer index names.

Create a test table.

CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag),  VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));

Write test data.

INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

Perform approximate nearest neighbor search.

SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;

The return result is as follows:

+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0   | c2                                                                                        | c3                                                                                         |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
|  3 |    3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set

HNSW_SQ example

CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));

HNSW_BQ example

CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));

The distance parameter of an HNSW_BQ index only supports l2.

IVF example

Notice

When you create an IVF index, the index name must be no longer than 33 characters. Otherwise, an exception may be returned because the index auxiliary table name exceeds the index_name limit. Future versions will support longer index names.

CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));

Create indexes after table creation

Notice

Currently, you can only create dense vector indexes after a table is created.

HNSW example

Create a test table.

CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));

Create an HNSW index.

CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);

View the created table.

SHOW CREATE TABLE vec_table_hnsw;

The return result is as follows:

+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table     | Create Table                                                                                                                                                                                                                                                                                                                                                                                         |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
  `id` int(11) DEFAULT NULL,
  `c2` VECTOR(10) DEFAULT NULL,
  VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

OceanBase(root@oceanbase)>SHOW INDEX FROM vec_table_hnsw;
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table     | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment   | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table |          1 | vec_idx1 |            1 | c2          | A         |        NULL | NULL     | NULL   | YES  | VECTOR     | available |               | YES     | NULL       |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set

HNSW_SQ example

Create a test table.

CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));

Create an HNSW_SQ index.

CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);

HNSW_BQ example

CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);

The distance parameter of an HNSW_BQ index supports l2 and cosine. Cosine is supported starting from V4.3.5 BP4.

IVF example

Create a test table.

CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));

Create an IVF index.

CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);

Drop indexes

The syntax for dropping a sparse vector index is as follows:

DROP INDEX vec_idx1 ON vec_table;

View the dropped index.

SHOW INDEX FROM vec_table;

The return result is as follows:

Empty set

Maintenance

The search performance decreases when the amount of incremental data is too large. To reduce the amount of data in the incremental data table, OceanBase Database introduces the DBMS_VECTOR package for maintaining vector indexes.

Incremental refresh

Notice

IVF/IVF_PQ indexes do not support incremental refresh.

If a large amount of data is written after the index is created, we recommend that you use the REFRESH_INDEX procedure to perform an incremental refresh. For more information, see REFRESH_INDEX.

An incremental refresh is performed every 15 minutes. If the number of incremental data entries exceeds 10,000, an incremental refresh is automatically performed.

Full refresh (rebuild)

Manual full table rebuild

If you update or delete a large amount of data after you create an index, we recommend that you use the REBUILD_INDEX procedure to perform a full refresh. For more information, see REBUILD_INDEX.

The system checks for a full refresh every 24 hours. If the new data exceeds 20% of the original data, a full refresh is automatically performed. A full refresh is performed in the background and asynchronously. First, a new index is created, and then the old index is replaced. During the rebuild, the old index remains available, but the overall process is relatively slow.

We also provide the vector_index_memory_saving_mode parameter to control the memory usage during index rebuild. Enabling this mode can reduce the memory consumption during the rebuild of vector indexes in partitioned tables. Typically, a vector index rebuild requires twice the memory of the index. When this mode is enabled, the system temporarily deletes the memory index of a partition after the index is built in that partition, thereby effectively reducing the total memory required for the rebuild. For more information, see vector_index_memory_saving_mode.

Consider the following notes:

When you execute an offline DDL operation (such as ALTER TABLE to modify the table structure or primary key), the index table is rebuilt. Since you cannot specify the parallelism during index rebuild, the system uses a single thread by default. Therefore, when the data volume is large, the rebuild process is relatively slow, which affects the overall efficiency of the offline DDL operation.
If you need to modify the index parameters during index rebuild, you must specify both type and distance in the parameter list. Additionally, the values of type and distance must be consistent with the original index type. For example, if the original index type is hnsw and the distance algorithm is l2, you must specify both type=hnsw and distance=l2 during the rebuild.
During index rebuild, the following operations are supported:
- Modifying the values of m, ef_search, and ef_construction.
- For IVF/IVF_PQ indexes in V4.3.5 BP5, modifying the values of nlist, sample_per_nlist, distance, nbits, and m, where m and nbits are unique parameters for IVF_PQ indexes.
- Online rebuild of the ef_search parameter.
- Rebuilding the index type between hnsw and hnsw_sq.
- Rebuilding the index type between ivf_flat and ivf_flat, and ivf_pq and ivf_pq.
- Specifying the parallelism during the rebuild. For more information, see REBUILD_INDEX.
During index rebuild, the following operations are not supported:
- Modifying the type and distance parameters.
- Rebuilding the index type between hnsw and ivf.
- Rebuilding the index type between hnsw and hnsw_bq.
- Rebuilding the index type between ivf_flat and ivf_pq.
IVF/IVF_PQ rebuild index recommendations (only applicable to V4.3.5 BP5):
- Before executing REBUILD INDEX, it is recommended to set the ob_trx_timeout parameter value to be greater than the time required for index rebuild to avoid timeout issues.

Automatic partition rebuild (recommended)

Notice

This feature is supported starting from V4.3.5 BP3. If you upgrade your vector cluster from an earlier version to V4.3.5 BP3, you need to manually perform a full table rebuild for all vector indexes after the upgrade. Otherwise, the automatic partition rebuild task may not execute after the upgrade.

Currently, only HNSW, HNSW_SQ, and HNSW_BQ index types are supported.

An automatic partition rebuilding task is triggered in the following two scenarios:

When you execute a vector index search statement.
During a scheduled check, which you can manually configure to run at a specified interval.

Configure the execution cycle

In the oceanbase database, configure the execution interval by using the vector_index_optimize_duty_time parameter. Example:
```
ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';
```
After the preceding configuration, the partition rebuild task is executed only between 23:00:00 and 24:00:00. The task is not initiated during other time periods. For more information about the parameters, see the corresponding parameter documentation.
View the task progress and history

You can query the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY view to view the task status and history.

You can view the current task status based on the status field in the view:
- 0 (PREPARE): The task is waiting to be executed.
- 1 (RUNNING): The task is being executed.
- 2 (PENDING): The task is paused.
- 3 (FINISHED): The task has been completed. The tasks that have completed execution, regardless of whether they were successful, are stored in the history table. For more information, see the corresponding view documentation.
Cancel the task

To cancel a task, you can obtain the trace_id from the DBA_OB_VECTOR_INDEX_TASKS or CDB_OB_VECTOR_INDEX_TASKS view and execute the following command:
```
ALTER SYSTEM CANCEL TASK <trace_id>;
```
For an example, see the following code:
```
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";
```

Performance optimization

Notice

This feature is available starting from V4.3.5 BP3 and only supports IVF indexes.

OceanBase Database provides an automatic performance optimization mechanism for IVF indexes. This mechanism improves search performance by managing cache and performing regular maintenance.

Optimization mechanism

The performance optimization of IVF indexes includes two automatic tasks:

Cache warming task: The system periodically checks all IVF indexes. If it finds that the cache corresponding to an index does not exist, it automatically triggers cache warming to load the index data into memory. Additionally, when an IVF index is created, the cache is automatically warmed.
Cache cleanup task: The system periodically checks all IVF caches. If it finds that the index corresponding to a cache has been deleted, it automatically cleans up the invalid cache to release memory resources. Additionally, when an IVF index is deleted, the cache is automatically cleaned up.

Configure the execution cycle of optimization tasks

The system allows you to customize the time window for performance optimization tasks to avoid executing them during peak business hours, which could affect performance.

In the oceanbase database, you can set the execution cycle by configuring the vector_index_optimize_duty_time parameter:

ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';

The configuration details are as follows:

The time format is [start time, end time].
The above configuration indicates that optimization tasks will only be executed from 23:00:00 to 24:00:00.
Optimization tasks will not be initiated during other time periods to avoid affecting normal business operations.

Monitor the optimization tasks

View the current task status

You can query the DBA_OB_VECTOR_INDEX_TASKS view to view tasks that are currently running or waiting to be executed:

-- Query in a MySQL tenant
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS;

Here is a sample result:

+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID           | TASK_ID | START_TIME                 | MODIFY_TIME                | TRIGGER_TYPE | STATUS   | TASK_TYPE | TASK_SCN         | RET_CODE | TRACE_ID                           |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|   500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | USER         | FINISHED |         2 | 1750671687770026 |        0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set

Task status description:

STATUS = 0: PREPARE, the task is waiting to be executed.
STATUS = 1: RUNNING, the task is being executed.
STATUS = 3: FINISHED, the task has been completed.

Task type description:

TASK_TYPE = 2: IVF cache warming task.
TASK_TYPE = 3: IVF cache cleanup task.

View historical task records

Completed tasks (STATUS = 3) are automatically transferred to the history table every 10 seconds, regardless of whether they were successful. You can query the DBA_OB_VECTOR_INDEX_TASKS_HISTORY view to view the historical records:

-- Query the historical records of a specified task ID
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS_HISTORY WHERE TASK_ID=2002281;

Here is a sample result:

+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID           | TASK_ID | START_TIME                 | MODIFY_TIME                | TRIGGER_TYPE | STATUS   | TASK_TYPE | TASK_SCN         | RET_CODE | TRACE_ID                           |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|   500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | AUTO         | FINISHED |         2 | 1750671687770026 |        0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set

Cancel an optimization task

You can run the following command to cancel the specified task.

-- trace_id is obtained from the DBA_OB_VECTOR_INDEX_TASKS_HISTORY view.
ALTER SYSTEM CANCEL TASK <trace_id>;

Notice

You can cancel a task only in the retry stage of the task, when you run the ALTER SYSTEM CANCEL TASK statement. If a background task is stuck in an execution stage, you cannot cancel it by using this statement.

Here is an example:

-- Log in to the SYS tenant and obtain the trace_id of the specified task.
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY WHERE TASK_ID=2037736;
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID           | TASK_ID | START_TIME                 | MODIFY_TIME                | TRIGGER_TYPE | STATUS   | TASK_TYPE | TASK_SCN         | RET_CODE | TRACE_ID                           |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|   500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER         | PREPARED |         2 | 1750682301145225 |       -1 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set

-- Cancel the task.
ALTER SYSTEM CANCEL TASK "YAFF00B9E4D97-00063839E6BDDEE0-0-1";

After the task is canceled, the task status changes to CANCELLED.

-- Log in to the user tenant and query the task status.
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY;
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID           | TASK_ID | START_TIME                 | MODIFY_TIME                | TRIGGER_TYPE | STATUS   | TASK_TYPE | TASK_SCN         | RET_CODE | TRACE_ID                           |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|   500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER         | FINISHED |         2 | 1750682301145225 |    -4072 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set

References

Use SQL functions