This topic describes how to create, search, maintain, and drop a vector index in OceanBase Database.
Notice
Index types
The following table describes the vector index types supported by OceanBase Database and their features.
| Index type | Description | Applicable scenarios |
|---|---|---|
| HNSW | The maximum dimension of the index column is 4096. The HNSW index is a memory-based index that must be fully loaded into memory. It supports DML operations and real-time search. | |
| HNSW_SQ | The HNSW_SQ index provides similar construction speed, search performance, and recall rate as the HNSW index, but the total memory usage is reduced to 1/2 to 1/3 of that of the HNSW index. | Scenarios that require high performance and recall rate. |
| HNSW_BQ | The HNSW_BQ index has a slightly lower recall rate than the HNSW index but significantly reduces memory usage. The BQ quantization compression algorithm (Rabitq) can compress vectors to 1/32 of their original size. As the dimension of the vector increases, the memory optimization effect of the HNSW_BQ index becomes more pronounced. | |
| IVF | The IVF index implemented based on the database table does not occupy resident memory. | Scenarios that do not require high performance but involve a large amount of data and are sensitive to costs. |
| IVF_PQ | The IVF_PQ index implemented based on the database table does not occupy resident memory. It applies the PQ quantization technique to the IVF index, resulting in a slightly lower recall rate than the IVF index but higher performance. The PQ quantization compression algorithm can compress vectors to 1/16 to 1/32 of their original size in most scenarios. | Scenarios that do not require high performance but involve a large amount of data and are sensitive to costs. |
| IVF_SQ (Experimental feature) | The IVF_SQ index implemented based on the database table does not occupy resident memory. It applies the SQ quantization technique to the IVF index, resulting in a slightly lower recall rate than the IVF index but higher performance. The SQ quantization compression algorithm can compress vectors to 1/3 to 1/4 of their original size in most scenarios. | Scenarios that do not require high performance but involve a large amount of data and are sensitive to costs. |
Some other notes:
- For dense vector indexes, you can use the L2, inner product (IP), or cosine distance as the index distance algorithm. For sparse vector indexes, you can use the IP as the index distance algorithm.
- You can call only some distance functions during vector index search. For more information, see Use SQL functions.
- You can specify a filter condition for vector search. The filter condition can be a scalar condition or a spatial relationship, such as ST_Intersects. Currently, you cannot use multi-value indexes, full-text indexes, or global indexes as pre-filterers.
- You can create vector indexes and full-text indexes on the same table. Vector indexes include dense and sparse indexes.
- For information about the support of vector indexes for Offline DDL operations, see Offline DDL.
- Weak read (Weak Read) is supported for vector index search starting from V4.3.5 BP4. Weak read allows you to read data from a replica in the local IDC instead of accessing the primary database across regions.
The following limitations apply:
- In V4.3.5 BP3, you cannot create a columnstore vector index.
Configure the memory for vector indexes
OceanBase Database uses the ob_vector_memory_limit_percentage parameter to configure the memory for vector indexes:
Before you use the HNSW/HNSW_SQ/HNSW_BQ vector indexes in versions earlier than V4.3.5 BP3, you must manually set the
ob_vector_memory_limit_percentageparameter to enable the vector feature. We recommend that you set the parameter to30for optimal search performance. If you do not set the parameter, the system does not allocate memory for vector indexes and an error is returned when you create an index. The IVF/IVF_SQ/IVF_PQ indexes do not need to be kept in memory, so you do not need to set this parameter. Example:ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;Starting from V4.3.5 BP3, the vector search feature is enabled by default. The default value of
0indicates the adaptive mode, where the system automatically adjusts the memory usage ratio of vector index data in the tenant. You do not need to manually adjust this parameter:- If the actual memory size of the tenant is 8 GB or less, the value is automatically adjusted to
40. - If the actual memory size of the tenant is more than 8 GB, the value is automatically adjusted to
50.
- If the actual memory size of the tenant is 8 GB or less, the value is automatically adjusted to
Query index memory estimation and actual usage
You can use the DBMS_VECTOR system package to estimate index memory:
- Before a table is created, call the INDEX_VECTOR_MEMORY_ADVISOR procedure to estimate index memory.
- After a table is created and data is inserted, call the INDEX_VECTOR_MEMORY_ESTIMATE procedure to analyze index memory.
The vector index memory estimation returns two pieces of information: the minimum memory configuration required to create a vector index and the actual memory usage after the HNSW_SQ and IVF indexes are created.
We also provide the load_vector_index_on_follower parameter to control whether the follower role automatically loads memory vector indexes. For syntax and examples, see load_vector_index_on_follower. If weak reads are not required, you can disable this parameter to reduce the memory used by vector indexes.
Syntax and description for vector index creation
OceanBase vector indexes can be created during table creation and after table creation. Please note the following when creating vector indexes:
- The
VECTORkeyword is required when creating a vector index. - The parameters and instructions for post-created indexes are consistent with those for indexes created during table creation.
- If the data volume is large, it is recommended to insert all data first and then create the index to achieve optimal search performance.
- For HNSW_SQ/IVF/IVF_SQ/IVF_PQ indexes, it is recommended to create the index after data insertion and to rebuild the index after inserting a large amount of incremental data. Specific creation instructions for each index type are provided in the examples below.
- Sparse vector indexes currently only support creation together with the table during table creation, and do not support post-creation.
Syntax for creating an index during table creation:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dimension), -- Vector column, must specify VECTOR type; dimension is the vector dimensionality
...,
VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
);
Syntax for creating an index after table creation:
-- Post-created indexes support setting parallelism to improve index build performance. The maximum parallelism is limited to CPU cores * 2.
CREATE [/*+ parallel $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);
The following table describes the parameters.
| Parameter | Default value | Value range | Required | Description | Notes |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The type of vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | hnsw/hnsw_sq/hnsw_bq |
Yes | The index type. | ||
| lib | vsag | vsag | No | The type of vector index library. | Only the VSAG vector library is supported. |
| m | 16 | [5,128] | No | The maximum number of neighbors per node. | The larger the value, the slower the index is built, but the better the search performance. |
| ef_construction | 200 | [5,1000] | No | The size of the candidate set during index building. | A larger value indicates a slower index building process but a better index quality. ef_construction must be greater than m. |
| ef_search | 64 | [1,1000] | No | The size of the candidate set during search. | The higher the value, the slower the search, and the higher the recall. |
| extra_info_max_size | 0 | [0,16384] | No | Sets the maximum size (in bytes) of each key's information. The primary keys of a table are stored in indexes to improve search efficiency. | 0: The key information is not stored.1: The key information is forcibly stored and the size limit is ignored. In this case, the primary key of a table must be supported.Greater than 1: The maximum size of key information (in bytes) is set. In this case, the following conditions must be met:
|
| refine_k | 4.0 | [1.0,1000.0] | No |
NoticeThis parameter is supported in V4.3.5 BP3 and later. This parameter is applicable only when you create an HNSW_BQ index. |
You can set this parameter when you create an index. You can also set this parameter when you perform a search.
|
| refine_type | sq8
NoticeIf the cluster is upgraded from an earlier version to V4.3.5 BP3, the default value of this parameter is fp32. |
sq8/fp32 | No |
NoticeThis parameter is supported in V4.3.5 BP3 and later. You can specify this parameter only when you create an HNSW_BQ index. |
This parameter improves the efficiency by reducing the memory overhead and index building time during index building, but may affect the recall rate. |
| bq_bits_query | 32 | 0/4/32 | No |
NoticeThis parameter is supported in V4.3.5 BP3 and later versions. It can be set only when an HNSW_BQ index is created. |
This value reduces the memory overhead and index-building time during index building, but may affect the recall rate. |
| bq_use_fht | true
NoticeIf the cluster is upgraded from a previous version to V4.3.5 BP3, the default value is false. |
true/false | No |
NoticeThis parameter is supported in V4.3.5 BP3 and later. It can be specified only when an HNSW_BQ index is created. |
The supported primary key types for extra_info_max_size are as follows:
- Numeric types: integer types, floating-point types, and Bit_value.
- Date and time types
- Character types: VARCHAR.
The size of the primary key information is calculated as follows:
SET @table_name = 'test'; -- Replace with the name of the table to be queried.
SELECT
CASE
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
ELSE COALESCE(SUM(result_value), 'not support')
END AS extra_info_size
FROM (
SELECT
CASE
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- The size of the primary key information of numeric types is 8 bytes.
WHEN oc.data_type = 22 THEN oc.data_length -- The size of the primary key information of VARCHAR types is the data length.
ELSE NULL -- The primary key information of other types is not supported.
END AS result_value
FROM
oceanbase.__all_column oc
JOIN
oceanbase.__all_virtual_data_type vdt
ON
oc.data_type = vdt.data_type
WHERE
oc.rowkey_position != 0
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;
-- The calculation result is 8 bytes.
Syntax for creating a vector index during table creation:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 data_type2,
...,
VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
);
Syntax for creating a vector index after table creation:
-- You can specify the parallelism to improve the index building performance. The maximum value of the parallelism is twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);
The following table describes the parameters.
| Parameter | Default value | Value range | Required | Description | Notes |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The type of vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | ivf_flat/ivf_sq8/ivf_pq | Yes | The IVF index type. | ||
| lib | ob | ob | No | The type of vector index library. | |
| nlist | 128 | [1,65536] | No | The number of clusters. | |
| sample_per_nlist | 256 | [1,int64_max] | Yes | The number of data samples for each cluster. This parameter is used for later index creation. | |
| nbits | 8 | [1,24] | No | The number of bits for quantization.
NoticeThis parameter is supported in V4.3.5 BP3 and later. You can specify this parameter only when you create an IVF_PQ index. |
We recommend that you set this parameter to 8 and specify a value in the range of [8,10]. A larger value indicates a higher quantization accuracy and search accuracy, but a lower search performance. |
| m | No default value. Must be specified. | [1,65536] | Yes | The dimension of the quantized vector.
NoticeThis parameter is supported in V4.3.5 BP3 and later. You can specify this parameter only when you create an IVF_PQ index. |
A larger value indicates a slower index building process and a higher search accuracy, but a lower search performance. |
Search syntax and description
Vector index search is approximate nearest neighbor search, which does not guarantee 100% accuracy. The recall rate is used to measure the accuracy of vector search. For example, if 9 out of 10 nearest neighbors can be stably returned, the recall rate is 90%. The recall rate is described as follows:
- The recall rate is affected by the build parameters and search parameters.
- Index search parameters are specified when creating the index and cannot be modified afterward. However, you can set them using session variables:
ob_hnsw_ef_searchfor HNSW/HNSW_SQ/HNSW_BQ indexes andob_ivf_nprobesfor IVF indexes. If session variables are set, their values will be prioritized. For more information, see ob_hnsw_ef_search and ob_ivf_nprobes.
The search syntax for dense vector indexes and sparse vector indexes is the same, as shown in the following example:
SELECT ... FROM $table_name ORDER BY $distance_function($column_name, $vector_expr) [APPROXIMATE|APPROX] LIMIT $num (OFFSET $num);
The search usage is described as follows:
Search syntax requirements:
- You must specify the
APPROXIMATE/APPROXkeyword to use vector indexes instead of full table scans. - The
ORDER BYandLIMITclauses must be included. - The
ORDER BYclause supports only a single vector condition. - The value of
LIMIT + OFFSETmust be in the range(0, 16384].
- You must specify the
Distance function usage rules:
- If you specify
APPROXIMATE/APPROX, the current version supports the distance function, and the distance function matches the vector index algorithm, the search will use the vector index. - If you specify
APPROXIMATE/APPROX, the current version supports the distance function, but the distance function does not match the vector index algorithm, the search will not use the vector index, but no error will be returned. - If you specify
APPROXIMATE/APPROX, the current version does not support the distance function, the search will not use the vector index, and an error will be returned. - If you do not specify
APPROXIMATE/APPROX, the current version supports the distance function, the search will not use the vector index, but no error will be returned.
- If you specify
Other considerations:
- The
WHEREcondition will be used as a filter after vector index search. - If the
LIMITclause is not specified, an error will be returned.
- The
Create, search, and delete examples
Create during table creation
Dense vector index examples
HNSW example
Notice
When you create an HNSW index, the index name must be no longer than 25 characters. Otherwise, an error may occur because the name of the index auxiliary table exceeds the index_name limit. Future versions will support longer index names.
Create a test table.
CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag), VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));
Write test data.
INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
Perform approximate nearest neighbor search.
SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;
The returned result is as follows:
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0 | c2 | c3 |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set
HNSW_SQ example
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
HNSW_BQ example
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));
The distance parameter of an HNSW_BQ index only supports l2.
IVF example
Notice
When you create an IVF index, the index name must be no longer than 33 characters. Otherwise, an error may occur because the name of the index auxiliary table exceeds the index_name limit. Future versions will support longer index names.
CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));
Sparse vector index example
CREATE TABLE t1_sprarse_vec (
c1 INT PRIMARY KEY,
c2 SPARSEVECTOR,
VECTOR INDEX idx_sparse_vec(c2) WITH (distance=inner_product)
);
A sparse vector index only supports the distance parameter, which can be set to inner_product.
Create after table creation
Notice
Currently, you can create only dense vector indexes after table creation.
HNSW example
Create a test table.
CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));
Create an HNSW index.
CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);
Query the created table.
SHOW CREATE TABLE vec_table_hnsw;
The returned result is as follows:
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
`id` int(11) DEFAULT NULL,
`c2` VECTOR(10) DEFAULT NULL,
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
OceanBase(root@oceanbase)>SHOW INDEX FROM vec_table_hnsw;
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set
HNSW_SQ example
Create a test table.
CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an HNSW_SQ index.
CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ example
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);
The distance parameter of an HNSW_BQ index only supports l2.
IVF example
Create a test table.
CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an IVF index.
CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);
Delete
The syntax for deleting a sparse vector index and a dense vector index is the same.
DROP INDEX vec_idx1 ON vec_table;
Query the deleted index.
SHOW INDEX FROM vec_table;
The returned result is as follows:
Empty set
Maintenance
The search performance decreases when the amount of incremental data is too large. To reduce the amount of data in the incremental data table, OceanBase Database introduces the DBMS_VECTOR package for maintaining vector indexes.
Incremental refresh
Notice
IVF/IVF_SQ/IVF_PQ indexes do not support incremental refresh.
If you write a large amount of data after you create an index, we recommend that you use the REFRESH_INDEX procedure to perform an incremental refresh. For more information, see REFRESH_INDEX.
The system checks for incremental data every 15 minutes. If the number of incremental data exceeds 10,000, the system automatically performs an incremental refresh.
Full refresh (rebuild)
Manually rebuild an index
If a large amount of data is updated or deleted after the index is created, it is recommended to use the REBUILD_INDEX procedure for a full refresh. For details and examples, see REBUILD_INDEX.
A full refresh is checked every 24 hours. If the amount of new data exceeds 20% of the original data, a full refresh is automatically performed. The full refresh is executed asynchronously in the background: a new index is created first, then the old index is replaced. During the rebuilding process, the old index remains available, but the overall process is relatively slow.
We also provide the vector_index_memory_saving_mode parameter to control memory usage during index rebuilding. Enabling this mode can reduce memory consumption during the rebuild process of vector indexes on partitioned tables. Typically, rebuilding a vector index requires memory equivalent to twice the size of the index. After enabling memory-saving mode, the system releases memory by temporarily deleting the in-memory index of a partition once its rebuild is complete, effectively reducing the total memory required for the operation. For syntax and examples, see vector_index_memory_saving_mode.
Considerations:
- Performing offline DDL operations (such as
ALTER TABLEto modify table structure or primary key) will trigger a rebuild of the index table. Since index rebuilding cannot specify parallelism, the system defaults to single-threaded execution. Therefore, with large amounts of data, the rebuilding process can be slow and may affect the overall efficiency of offline DDL execution. - If you need to modify index parameters during rebuilding, you must specify both
typeanddistancein the parameter list, and their values must match the original index type. For example, if the original index type ishnswand the distance algorithm isl2, you must specify bothtype=hnswanddistance=l2during rebuilding. - Index rebuilding supports:
- Modifying the values of
m,ef_search, andef_construction. - Online rebuilding of the
ef_searchparameter. - Rebuilding between
hnswandhnsw_sqindex types. - Rebuilding between
ivf_flat<->ivf_flat,ivf_sq8<->ivf_sq8, andivf_pq<->ivf_pqindex types. - Setting parallelism during rebuilding. For examples, see REBUILD_INDEX.
- Modifying the values of
- Index rebuilding does not support:
- Modifying the
typeanddistancetypes. - Rebuilding between
hnswandivfindex types. - Rebuilding between
hnswandhnsw_bqindex types. - Cross-rebuilding among
ivf_flat,ivf_pq, andivf_sq8index types.
- Modifying the
Automatic partition rebuild (Recommended)
Notice
The automatic partition rebuild task is triggered in the following two scenarios:
- When you execute a vector index search statement.
- During a scheduled check, which you can manually configure to run at a specified interval.
Configure the execution interval
In the
oceanbasedatabase, set the vector_index_optimize_duty_time parameter to configure the execution interval. Example:ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';After the preceding configuration, the partition rebuild task is executed only between 23:00:00 and 24:00:00. The task is not initiated during other time periods. For more information about the parameters, see the corresponding parameter documentation.
View task progress and history
You can query the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY view to view the task progress and history.
You can use the
statusfield to determine the current status of the task:- 0 (PREPARE): The task is waiting to be executed.
- 1 (RUNNING): The task is being executed.
- 2 (PENDING): The task is paused.
- 3 (FINISHED): The task has been completed. The task with the
status=FINISHEDis stored in the history table regardless of whether it was executed successfully. For more information, see the corresponding view documentation.
Cancel a task
To cancel a task, obtain the trace_id from the
DBA_OB_VECTOR_INDEX_TASKSorCDB_OB_VECTOR_INDEX_TASKSview and execute the following command:ALTER SYSTEM CANCEL TASK <trace_id>;Example:
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";
Performance optimization
Notice
This feature is supported in V4.3.5 BP3 and later. Currently, only IVF indexes are supported.
OceanBase Database provides an automatic performance optimization mechanism for IVF indexes. This mechanism improves search performance by managing cache and performing regular maintenance.
Optimization mechanism
IVF index performance optimization includes two automatic tasks:
- Cache preheating task: The system periodically checks all IVF indexes. If the cache corresponding to an index does not exist, the cache preheating task is automatically triggered to load the index data into memory. Additionally, when an IVF index is created, the cache is automatically preheated.
- Cache cleanup task: The system periodically checks all IVF caches. If the index corresponding to a cache has been deleted, the cache cleanup task is automatically triggered to clean up the invalid cache and release memory resources. Additionally, when an IVF index is deleted, the cache is automatically cleaned up.
Configure the execution period of optimization tasks
You can customize the execution period of performance optimization tasks to avoid affecting performance during business peak hours.
In the oceanbase database, set the vector_index_optimize_duty_time parameter to configure the execution period:
ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';
The configuration is described as follows:
- The time format is
[start time, end time]. - The above configuration indicates that optimization tasks are executed only between 23:00:00 and 24:00:00.
- Optimization tasks are not initiated during other time periods to avoid affecting normal business.
Monitor optimization tasks
View the status of current tasks
You can query the
DBA_OB_VECTOR_INDEX_TASKSview to view the status of tasks that are being executed or waiting to be executed:-- Query in a MySQL tenant. SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS;Example return result:
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID | +----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ | 500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | USER | FINISHED | 2 | 1750671687770026 | 0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 | +----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ 1 row in setThe task status is described as follows:
STATUS = 0: PREPARE, the task is waiting to be executed.STATUS = 1: RUNNING, the task is being executed.STATUS = 3: FINISHED, the task has been completed.
The task type is described as follows:
TASK_TYPE = 2: IVF cache preheating task.TASK_TYPE = 3: IVF cache cleanup task.
View the history of completed tasks
Completed tasks (with
STATUS = 3) are automatically stored in the history table every 10 seconds, regardless of whether they are successful. You can query theDBA_OB_VECTOR_INDEX_TASKS_HISTORYview to view the history:-- Query the history of a specified task. SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS_HISTORY WHERE TASK_ID=2002281;Example return result:
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID | +----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ | 500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | AUTO | FINISHED | 2 | 1750671687770026 | 0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 | +----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+ 1 row in set
Cancel an optimization task
You can run the following command to cancel a specified task.
-- trace_id is obtained from the DBA_OB_VECTOR_INDEX_TASKS_HISTORY view.
ALTER SYSTEM CANCEL TASK <trace_id>;
Notice
You can cancel a task only in the retry phase of the task execution, when you run the ALTER SYSTEM CANCEL TASK command. If a background task is stuck in a specific phase, you cannot cancel the background task by using this command.
Here is an example:
-- Log in to the SYS tenant and obtain the trace_id of the specified task.
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY WHERE TASK_ID=2037736;
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| 500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER | PREPARED | 2 | 1750682301145225 | -1 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set
-- Cancel the task.
ALTER SYSTEM CANCEL TASK "YAFF00B9E4D97-00063839E6BDDEE0-0-1";
After the task is canceled, the task status changes to CANCELLED.
-- Log in to the user tenant and query the task status.
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY;
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| 500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER | FINISHED | 2 | 1750682301145225 | -4072 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set