This topic describes how to create, search, maintain, and delete vector indexes in OceanBase Database.
Notice
Index types
OceanBase Database supports the following vector index types:
| Index type | Description | Scenario |
|---|---|---|
| HNSW | The maximum dimension of an index column is 4096. An HNSW index is an in-memory index that needs to be fully loaded into memory. | |
| HNSW_SQ | An HNSW_SQ index provides similar index construction speed, search performance, and recall rate as an HNSW index, but reduces the total memory usage to 1/2 to 1/3 of that of an HNSW index. | Scenarios with high requirements on performance and recall rate. |
| HNSW_BQ | An HNSW_BQ index has a slightly lower recall rate than an HNSW index, but significantly reduces memory usage. The Rabitq algorithm can compress vectors to 1/32 of their original size. As the dimension of a vector increases, the memory optimization effect of an HNSW_BQ index becomes more significant. | |
| IVF | An IVF index based on a database table that does not occupy resident memory. | Scenarios with low performance requirements, large data volumes, and cost sensitivity. |
| IVF_PQ | An IVF_PQ index based on a database table that does not occupy resident memory. The PQ quantization algorithm is applied to the IVF index. The recall rate of an IVF_PQ index is slightly lower than that of an IVF index, but its performance is higher. In general scenarios, the PQ quantization algorithm can compress vectors to 1/16 to 1/32 of their original size. | Scenarios with low performance requirements, large data volumes, and cost sensitivity. |
Other notes:
- Dense vector indexes support the L2, inner product (IP), and cosine distance algorithms.
- You can call some distance functions during vector index search. For more information, see Use SQL functions.
- You can specify filter conditions for vector search. The filter conditions can be scalar conditions or spatial relationships such as ST_Intersects. Multi-value indexes, full-text indexes, and global indexes cannot be used as pre-filterers.
- You can create a vector index and a full-text index in the same table.
- For information about whether vector indexes support offline DDL operations, see Offline DDL.
- Weak read (Weak Read) is supported for vector index search starting from V4.3.5 BP4. Weak read allows you to read data from a replica in the local IDC instead of accessing the primary database across regions.
Limitations:
- In OceanBase Database V4.3.5 BP3, you cannot create a columnstore vector index.
Configure vector index memory
OceanBase Database allows you to configure the memory for vector indexes by setting the ob_vector_memory_limit_percentage parameter:
Before you use the HNSW/HNSW_SQ/HNSW_BQ vector index in versions earlier than V4.3.5 BP3, you must manually set the
ob_vector_memory_limit_percentageparameter to enable the vector search feature. We recommend that you set the parameter to30for optimal search performance. If you do not set the parameter to a value other than the default value, no memory is allocated for vector indexes, and an error is returned when you create an index. The IVF/IVF_PQ vector indexes do not need to be resident in memory, so you do not need to consider this parameter. Example:ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;Starting from V4.3.5 BP3, the vector search feature is enabled by default. The default value of
0indicates adaptive mode, where the system automatically adjusts the memory usage ratio for vector indexes in the tenant without manual intervention:- If the actual memory of the tenant is 8 GB or less, the value is automatically adjusted to
40. - If the actual memory of the tenant exceeds 8 GB, the value is automatically adjusted to
50.
- If the actual memory of the tenant is 8 GB or less, the value is automatically adjusted to
Query the estimated and actual memory usage of indexes
You can use the DBMS_VECTOR package to estimate the memory usage of indexes:
- Before you create a table, you can call the INDEX_VECTOR_MEMORY_ADVISOR procedure to estimate the memory usage of indexes.
- After you create a table and insert data into the table, you can call the INDEX_VECTOR_MEMORY_ESTIMATE procedure to analyze the memory usage of indexes.
The memory usage of vector indexes includes the minimum memory configuration required to create vector indexes and the actual memory usage after the HNSW_SQ and IVF indexes are created.
You can set the load_vector_index_on_follower parameter to specify whether to automatically load the memory vector index on a follower node. For more information, see load_vector_index_on_follower. If weak-consistency reads are not required, you can disable this parameter to reduce the memory usage of vector indexes.
You can query the GV$OB_VECTOR_MEMORY or V$OB_VECTOR_MEMORY view to obtain the detailed information about the actual memory usage.
Syntax and description
OceanBase supports two ways to create vector indexes: during table creation and after table creation. Please keep the following in mind:
- The
VECTORkeyword is required when creating a vector index. - The parameters and instructions for creating an index after table creation are the same as those for creating one during table creation.
- If you have a large amount of data, it’s recommended to insert all the data first and then create the index to achieve optimal search performance.
- For HNSW_SQ, IVF, and IVF_PQ indexes, it’s best to create the index after data insertion and to rebuild the index after adding a significant amount of new data. Detailed instructions for creating each type of index can be found in the examples below.
Syntax for creating an index when you create a table:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dimension), -- A vector column must be of the VECTOR type. The dimension parameter specifies the vector dimension.
...,
VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
);
Syntax for creating an index later:
-- You can set the parallelism to improve the index construction performance. The maximum value of the parallelism cannot exceed twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);
The following table describes the param parameters.
| Parameter | Default value | Value range | Required | Description | Remarks |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The type of vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | hnsw / hnsw_sq/ hnsw_bq |
Yes | The type of the index. | ||
| lib | vsag | vsag | No | The type of vector index library. | Only the VSAG vector library is supported. |
| m | 16 | [5,128] | No | The maximum number of neighbors per node. | The larger the value, the slower the index is built, but the better the search performance. |
| ef_construction | 200 | [5,1000] | No | The size of the candidate set during index building. | A larger value indicates a slower index building process but a better index quality. ef_construction must be greater than m. |
| ef_search | 64 | [1,1000] | No | The size of the candidate set during search. | The higher the value, the slower the search, and the higher the recall. |
| extra_info_max_size | 0 | [0,16384] | No | Sets the maximum size (in bytes) of each key's information. The primary keys of a table are stored in indexes to improve search efficiency. | 0: The key information is not stored.1: The key information is forcibly stored and the size limit is ignored. In this case, the primary key of a table must be supported.Greater than 1: The maximum size of key information (in bytes) is set. In this case, the following conditions must be met:
|
| refine_k | 4.0 | [1.0,1000.0] | No |
NoticeThis parameter is supported in V4.3.5 BP3 and later. This parameter is applicable only when you create an HNSW_BQ index. |
You can set this parameter when you create an index. You can also set this parameter when you perform a search.
|
| refine_type | sq8
NoticeIf the cluster is upgraded from an earlier version to V4.3.5 BP3, the default value of this parameter is fp32. |
sq8/fp32 | No |
NoticeThis parameter is supported in V4.3.5 BP3 and later. You can specify this parameter only when you create an HNSW_BQ index. |
This parameter improves the efficiency by reducing the memory overhead and index building time during index building, but may affect the recall rate. |
| bq_bits_query | 32 | 0/4/32 | No |
NoticeThis parameter is supported in V4.3.5 BP3 and later versions. It can be set only when an HNSW_BQ index is created. |
This value reduces the memory overhead and index-building time during index building, but may affect the recall rate. |
| bq_use_fht | true
NoticeIf the cluster is upgraded from a previous version to V4.3.5 BP3, the default value is false. |
true/false | No |
NoticeThis parameter is supported in V4.3.5 BP3 and later. It can be specified only when an HNSW_BQ index is created. |
The following primary key types are supported for the extra_info_max_size parameter:
- Numeric types: integer, float, and Bit_value types.
- Date and time types
- Character types: VARCHAR type.
The size of the primary key information is calculated as follows:
SET @table_name = 'test'; -- Replace table_name with the name of the table to be queried.
SELECT
CASE
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
ELSE COALESCE(SUM(result_value), 'not support')
END AS extra_info_size
FROM (
SELECT
CASE
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- For numeric types, extra_info_size += 8
WHEN oc.data_type = 22 THEN oc.data_length -- For varchar types, add data_length to extra_info_size.
ELSE NULL -- unsupported types
END AS result_value
FROM
oceanbase.__all_column oc
JOIN
oceanbase.__all_virtual_data_type vdt
ON
oc.data_type = vdt.data_type
WHERE
oc.rowkey_position != 0
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;
-- Result is 8 bytes.
The syntax for creating an index at the table creation stage is as follows:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 data_type2,
...,
VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
);
The syntax for creating an index on an existing table is as follows:
-- If you specify a parallelism value for a composite index after you create the index, the system boosts the performance of index construction. The maximum parallelism value is 2 times the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);
Description of the param parameter:
| Parameter | Default value | Value range | Whether required | Description | Remarks |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | Specifies the type of vector distance algorithm. | l2 specifies the Euclidean distance, inner_product specifies the inner product distance, and cosine specifies the cosine distance. | |
| type | ivf_flat/ivf_pq | Yes | The IVF index type. | ||
| lib | ob | ob | No | Specifies the vector index library type. | |
| nlist | 128 | [1,65536] | No | The number of cluster centers. | In V4.3.5 BP5: If the data volume is less than 1 million, the recommended value is data_volume/1000. If the data volume is greater than 1 million, the recommended value is sqrt(data_volume). For multi-partition tables, estimate data_volume based on the data volume of the largest partition. |
| sample_per_nlist | 256 | [1, int64_max] | Yes | Number of samples taken for each cluster center. This parameter is used when you build an index. | In V4.3.5 BP5, it is recommended to keep the default value and not modify it. |
| nbits | 8 | [1,24] | No | Specifies the quantization bit length.
NoticeThis parameter is supported in V4.3.5 BP3 and later. It can be specified only when you create an IVF_PQ index. |
We recommend that you set the value to 8 and set the value to a value between 8 and 10. A larger value provides higher quantization accuracy and higher search accuracy, but affects the search performance. |
| m | No default value. You must specify a value. | [1,65536] | Yes | The dimension of the quantized vectors.
NoticeThis parameter is supported in V4.3.5 BP3 and later. It can be specified only when you create an IVF_PQ index. |
The larger the value, the slower the index is built but the higher the search accuracy. However, the search performance is affected. |
For IVF/IVF_PQ indexes (only applicable to V4.3.5 BP5), the following additional notes apply:
- Index limitations:
- IVF/IVF_PQ indexes cannot be created when heap tables and partitioned tables are used together.
- Build recommendations:
- It is strongly recommended to create IVF/IVF_PQ indexes after data import is complete. It is not recommended to create IVF indexes synchronously when creating a table. If there is no data when creating the index, the index cannot perform clustering and will be unusable. In this case, you need to manually execute
REBUILD INDEXorDROPand thenCREATE INDEXagain. - Before creating an index, it is recommended to refer to the Query the estimated and actual memory usage of indexes section of this topic to estimate and query memory usage to avoid memory shortage during creation.
- Before creating an index, it is recommended to execute a major compaction
ALTER SYSTEM major freezeand wait for it to complete. - In multi-node cluster mode, it is recommended to set the SYS tenant memory to greater than 1 GB.
- Before creating an index, it is recommended to set the parallel_servers_target parameter value to
max_cpu * 10of the tenant. - For scenarios where the data volume of a single partition exceeds 10 million and the tenant CPU resources are higher than 10 cores, it is recommended to modify the sampling ratio by
ALTER SYSTEM SET _px_object_sampling = 5000before creating an index to improve index construction efficiency.
- It is strongly recommended to create IVF/IVF_PQ indexes after data import is complete. It is not recommended to create IVF indexes synchronously when creating a table. If there is no data when creating the index, the index cannot perform clustering and will be unusable. In this case, you need to manually execute
Search syntax and description
Vector index search is an approximate nearest neighbor search and does not guarantee 100% accuracy. The accuracy of vector search is measured by the recall rate. For example, if 9 out of 10 nearest neighbors can be returned stably, the recall rate is 90%. The recall rate is described as follows:
- The recall rate is affected by the build parameters and search parameters.
- Index search parameters are specified when creating the index and cannot be modified afterward. However, you can set them using session variables:
ob_hnsw_ef_searchfor HNSW/HNSW_SQ/HNSW_BQ indexes andob_ivf_nprobesfor IVF indexes. If session variables are set, their values will be prioritized. For more information, see ob_hnsw_ef_search and ob_ivf_nprobes.
The search syntax for dense vector indexes is as follows:
SELECT ... FROM $table_name ORDER BY $distance_function($column_name, $vector_expr) [APPROXIMATE|APPROX] LIMIT $num (OFFSET $num) [PARAMETERS ($param1=$value1, ...)];
The search usage is described as follows:
Search syntax requirements:
- You must specify the
APPROXIMATE/APPROXkeyword to use vector indexes instead of full table scans. - The
ORDER BYandLIMITclauses must be included. - The
ORDER BYclause only supports a single vector condition. - The value of
LIMIT + OFFSETmust be in the range(0, 16384]. - The
PARAMETERSclause is used to specify search parameters. For IVF/IVF_PQ indexes, thenprobesparameter is supported, for example,PARAMETERS(nprobes=200). This parameter is supported starting from V4.3.5 BP5 and is used to specify the number of cluster centers during search.
- You must specify the
Distance function usage rules:
- If you specify
APPROXIMATE/APPROX, the current version supports the distance function, and the distance function matches the vector index algorithm, the search will use the vector index. - If you specify
APPROXIMATE/APPROX, the current version supports the distance function, but the distance function does not match the vector index algorithm, the search will not use the vector index, but no error will be returned. - If you specify
APPROXIMATE/APPROX, the current version does not support the distance function, the search will not use the vector index, and an error will be returned. - If you do not specify
APPROXIMATE/APPROX, the current version supports the distance function, the search will not use the vector index, but no error will be returned.
- If you specify
Notice
Starting from V4.3.5 BP5, you can use the PARAMETERS(similarity=$value) clause to specify a similarity threshold for search.
Similarity threshold parameter (
similarity):- Syntax: Add
PARAMETERS (similarity=$value)after theLIMITclause, where$valueis the similarity threshold (range:[0, 1]). Generally, a larger similarity value indicates a closer match, which corresponds to a smaller distance. This is an exception for theinner_productdistance function, where a larger distance indicates a closer match. - Function: After specifying the similarity threshold, only results with a similarity value greater than or equal to the threshold are returned.
- Supported index types: HNSW, HNSW_SQ, HNSW_BQ, IVF, and IVF_PQ.
- Supported distance types:
- During queries, you can specify the similarity threshold for indexes of the
cosine_distanceandl2_distancetypes. Forl2_distance, it is recommended to perform L2 normalization on the vectors to improve the accuracy of similarity search. For more information and examples, see Vector normalization. - You cannot specify the similarity threshold for indexes of the
inner_producttype during queries. This will return an errornot support.
- During queries, you can specify the similarity threshold for indexes of the
- Correspondence between similarity and distance:
cosine_distance=2 - 2 * similarityl2_distance=sqrt(1 / similarity -1)
- Syntax: Add
Other considerations:
- The
WHEREcondition will be used as a filter after vector index search. - If the
LIMITclause is not specified, an error will be returned.
- The
IVF/IVF_PQ index search optimization recommendations (only applicable to V4.3.5 BP5):
- By setting
ROW_FORMAT=COMPRESSEDwhen creating the main table to enable compressed row format, you can improve IVF vector index search performance by 20~30%, for example:CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 VECTOR(4)) ROW_FORMAT=COMPRESSED; - It is not recommended to enable parallel search, as it may affect search performance or result in incorrect results.
- By setting
Create, search, and delete examples
Create indexes during table creation
Dense vector index examples
HNSW example
Notice
When you create an HNSW index, the index name must be no longer than 25 characters. Otherwise, an exception may be returned because the index auxiliary table name exceeds the index_name limit. Future versions will support longer index names.
Create a test table.
CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag), VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));
Write test data.
INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
Perform approximate nearest neighbor search.
SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;
The return result is as follows:
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0 | c2 | c3 |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set
HNSW_SQ example
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
HNSW_BQ example
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));
The distance parameter of an HNSW_BQ index only supports l2.
IVF example
Notice
When you create an IVF index, the index name must be no longer than 33 characters. Otherwise, an exception may be returned because the index auxiliary table name exceeds the index_name limit. Future versions will support longer index names.
CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));
Create indexes after table creation
Notice
Currently, you can only create dense vector indexes after a table is created.
HNSW example
Create a test table.
CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));
Create an HNSW index.
CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);
View the created table.
SHOW CREATE TABLE vec_table_hnsw;
The return result is as follows:
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
`id` int(11) DEFAULT NULL,
`c2` VECTOR(10) DEFAULT NULL,
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
OceanBase(root@oceanbase)>SHOW INDEX FROM vec_table_hnsw;
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set
HNSW_SQ example
Create a test table.
CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an HNSW_SQ index.
CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ example
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);
The distance parameter of an HNSW_BQ index supports l2 and cosine. Cosine is supported starting from V4.3.5 BP4.
IVF example
Create a test table.
CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an IVF index.
CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);
Drop indexes
The syntax for dropping a sparse vector index is as follows:
DROP INDEX vec_idx1 ON vec_table;
View the dropped index.
SHOW INDEX FROM vec_table;
The return result is as follows:
Empty set
Maintenance
The search performance decreases when the amount of incremental data is too large. To reduce the amount of data in the incremental data table, OceanBase Database introduces the DBMS_VECTOR package for maintaining vector indexes.
Incremental refresh
Notice
IVF/IVF_PQ indexes do not support incremental refresh.
If a large amount of data is written after the index is created, we recommend that you use the REFRESH_INDEX procedure to perform an incremental refresh. For more information, see REFRESH_INDEX.
An incremental refresh is performed every 15 minutes. If the number of incremental data entries exceeds 10,000, an incremental refresh is automatically performed.
Full refresh (rebuild)
Manual full table rebuild
If you update or delete a large amount of data after you create an index, we recommend that you use the REBUILD_INDEX procedure to perform a full refresh. For more information, see REBUILD_INDEX.
The system checks for a full refresh every 24 hours. If the new data exceeds 20% of the original data, a full refresh is automatically performed. A full refresh is performed in the background and asynchronously. First, a new index is created, and then the old index is replaced. During the rebuild, the old index remains available, but the overall process is relatively slow.
We also provide the vector_index_memory_saving_mode parameter to control the memory usage during index rebuild. Enabling this mode can reduce the memory consumption during the rebuild of vector indexes in partitioned tables. Typically, a vector index rebuild requires twice the memory of the index. When this mode is enabled, the system temporarily deletes the memory index of a partition after the index is built in that partition, thereby effectively reducing the total memory required for the rebuild. For more information, see vector_index_memory_saving_mode.
Consider the following notes:
- When you execute an offline DDL operation (such as
ALTER TABLEto modify the table structure or primary key), the index table is rebuilt. Since you cannot specify the parallelism during index rebuild, the system uses a single thread by default. Therefore, when the data volume is large, the rebuild process is relatively slow, which affects the overall efficiency of the offline DDL operation. - If you need to modify the index parameters during index rebuild, you must specify both
typeanddistancein the parameter list. Additionally, the values oftypeanddistancemust be consistent with the original index type. For example, if the original index type ishnswand the distance algorithm isl2, you must specify bothtype=hnswanddistance=l2during the rebuild. - During index rebuild, the following operations are supported:
- Modifying the values of
m,ef_search, andef_construction. - For IVF/IVF_PQ indexes in V4.3.5 BP5, modifying the values of
nlist,sample_per_nlist,distance,nbits, andm, wheremandnbitsare unique parameters for IVF_PQ indexes. - Online rebuild of the
ef_searchparameter. - Rebuilding the index type between
hnswandhnsw_sq. - Rebuilding the index type between
ivf_flatandivf_flat, andivf_pqandivf_pq. - Specifying the parallelism during the rebuild. For more information, see REBUILD_INDEX.
- Modifying the values of
- During index rebuild, the following operations are not supported:
- Modifying the
typeanddistanceparameters. - Rebuilding the index type between
hnswandivf. - Rebuilding the index type between
hnswandhnsw_bq. - Rebuilding the index type between
ivf_flatandivf_pq.
- Modifying the
- IVF/IVF_PQ rebuild index recommendations (only applicable to V4.3.5 BP5):
- Before executing
REBUILD INDEX, it is recommended to set theob_trx_timeoutparameter value to be greater than the time required for index rebuild to avoid timeout issues.
- Before executing
Automatic partition rebuild (recommended)
Notice
An automatic partition rebuilding task is triggered in the following two scenarios:
- When you execute a vector index search statement.
- During a scheduled check, which you can manually configure to run at a specified interval.
Configure the execution cycle
In the
oceanbasedatabase, configure the execution interval by using the vector_index_optimize_duty_time parameter. Example:ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';After the preceding configuration, the partition rebuild task is executed only between 23:00:00 and 24:00:00. The task is not initiated during other time periods. For more information about the parameters, see the corresponding parameter documentation.
View the task progress and history
You can query the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY view to view the task status and history.
You can view the current task status based on the
statusfield in the view:- 0 (PREPARE): The task is waiting to be executed.
- 1 (RUNNING): The task is being executed.
- 2 (PENDING): The task is paused.
- 3 (FINISHED): The task has been completed. The tasks that have completed execution, regardless of whether they were successful, are stored in the history table. For more information, see the corresponding view documentation.
Cancel the task
To cancel a task, you can obtain the trace_id from the
DBA_OB_VECTOR_INDEX_TASKSorCDB_OB_VECTOR_INDEX_TASKSview and execute the following command:ALTER SYSTEM CANCEL TASK <trace_id>;For an example, see the following code:
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";
Performance optimization
Notice
This feature is available starting from V4.3.5 BP3 and only supports IVF indexes.
OceanBase Database provides an automatic performance optimization mechanism for IVF indexes. This mechanism improves search performance by managing cache and performing regular maintenance.
Optimization mechanism
The performance optimization of IVF indexes includes two automatic tasks:
- Cache warming task: The system periodically checks all IVF indexes. If it finds that the cache corresponding to an index does not exist, it automatically triggers cache warming to load the index data into memory. Additionally, when an IVF index is created, the cache is automatically warmed.
- Cache cleanup task: The system periodically checks all IVF caches. If it finds that the index corresponding to a cache has been deleted, it automatically cleans up the invalid cache to release memory resources. Additionally, when an IVF index is deleted, the cache is automatically cleaned up.
Configure the execution cycle of optimization tasks
The system allows you to customize the time window for performance optimization tasks to avoid executing them during peak business hours, which could affect performance.
In the oceanbase database, you can set the execution cycle by configuring the vector_index_optimize_duty_time parameter:
ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';
The configuration details are as follows:
- The time format is
[start time, end time]. - The above configuration indicates that optimization tasks will only be executed from 23:00:00 to 24:00:00.
- Optimization tasks will not be initiated during other time periods to avoid affecting normal business operations.
Monitor the optimization tasks
View the current task status
You can query the
DBA_OB_VECTOR_INDEX_TASKSview to view tasks that are currently running or waiting to be executed:-- Query in a MySQL tenant SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS;Here is a sample result:
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID | +----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ | 500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | USER | FINISHED | 2 | 1750671687770026 | 0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 | +----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ 1 row in setTask status description:
STATUS = 0: PREPARE, the task is waiting to be executed.STATUS = 1: RUNNING, the task is being executed.STATUS = 3: FINISHED, the task has been completed.
Task type description:
TASK_TYPE = 2: IVF cache warming task.TASK_TYPE = 3: IVF cache cleanup task.
View historical task records
Completed tasks (
STATUS = 3) are automatically transferred to the history table every 10 seconds, regardless of whether they were successful. You can query theDBA_OB_VECTOR_INDEX_TASKS_HISTORYview to view the historical records:-- Query the historical records of a specified task ID SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS_HISTORY WHERE TASK_ID=2002281;Here is a sample result:
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID | +----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ | 500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | AUTO | FINISHED | 2 | 1750671687770026 | 0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 | +----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ 1 row in set
Cancel an optimization task
You can run the following command to cancel the specified task.
-- trace_id is obtained from the DBA_OB_VECTOR_INDEX_TASKS_HISTORY view.
ALTER SYSTEM CANCEL TASK <trace_id>;
Notice
You can cancel a task only in the retry stage of the task, when you run the ALTER SYSTEM CANCEL TASK statement. If a background task is stuck in an execution stage, you cannot cancel it by using this statement.
Here is an example:
-- Log in to the SYS tenant and obtain the trace_id of the specified task.
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY WHERE TASK_ID=2037736;
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| 500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER | PREPARED | 2 | 1750682301145225 | -1 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set
-- Cancel the task.
ALTER SYSTEM CANCEL TASK "YAFF00B9E4D97-00063839E6BDDEE0-0-1";
After the task is canceled, the task status changes to CANCELLED.
-- Log in to the user tenant and query the task status.
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY;
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| 500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER | FINISHED | 2 | 1750682301145225 | -4072 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set