This topic describes how to monitor and maintain vector indexes in OceanBase Database.
Monitoring
You can view the basic information and real-time status of vector indexes in the system view.
- You can query the [G]V$OB_HNSW_INDEX_INFO view to obtain the basic information and real-time status of HNSW vector indexes.
- You can query the [G]V$OB_IVF_INDEX_INFO view to obtain the basic information and real-time status of IVF vector indexes.
Maintenance
When the amount of incremental data is large, the search performance will decrease. To reduce the amount of data in the incremental data table, OceanBase Database introduces the DBMS_VECTOR package for maintaining vector indexes.
Incremental refresh
Notice
IVF/IVF_PQ indexes do not support incremental refresh.
If you write a large amount of data after creating an index, we recommend that you use the REFRESH_INDEX procedure to perform an incremental refresh. For more information, see REFRESH_INDEX.
An incremental refresh is performed every 15 minutes. If the number of incremental data entries exceeds 10,000, an incremental refresh is automatically performed.
Full refresh (rebuild)
Manually Rebuild the Index
If a large amount of data is updated or deleted after the index is created, we recommend that you use the REBUILD_INDEX procedure to perform a full refresh. For more information, see REBUILD_INDEX.
A full refresh is performed every 24 hours. If the amount of new data exceeds 20% of the original data, a full refresh is automatically performed. The full refresh is performed in the background and asynchronously. First, a new index is created, and then the old index is replaced. During the rebuild, the old index remains available, but the overall process is relatively slow.
We also provide the vector_index_memory_saving_mode parameter to control the memory usage during index rebuilds. Enabling this mode can reduce the memory consumption during the rebuild of vector indexes in partitioned tables. Typically, vector index rebuilds consume twice the memory of the index. When this mode is enabled, the system temporarily deletes the memory index of a partition after it is built to release memory, effectively reducing the total memory required for the rebuild. For more information, see vector_index_memory_saving_mode.
Consider the following notes:
When you perform an offline DDL operation (such as
ALTER TABLEto modify the table structure or primary key), the index table will be rebuilt. Since parallelism cannot be specified during index rebuilds, the system defaults to using a single thread. Therefore, when the amount of data is large, the rebuild process can be slow, which affects the overall efficiency of the offline DDL operation.If you need to modify the index parameters during the rebuild, you must specify both
typeanddistancein the parameter list, and they must match the original index type. For example, if the original index type ishnswand the distance algorithm isl2, you must specify bothtype=hnswanddistance=l2during the rebuild.During the rebuild, you can:
- Modify the values of
m,ef_search, andef_construction. - Online rebuild the
ef_searchparameter. - Rebuild the index type between
hnswandhnsw_sq. - Rebuild the index type between
ivf_flatandivf_flat, or betweenivf_pqandivf_pq. - Set the parallelism during the rebuild. For an example, see REBUILD_INDEX.
- Modify the values of
During the rebuild, you cannot:
- Modify the
typeanddistanceparameters. - Rebuild the index type between
hnswandivf. - Rebuild the index type between
hnswandhnsw_bq. - Rebuild the index type between
ivf_flatandivf_pq.
- Modify the
Automatic partition rebuild (Recommended)
Notice
Automatic partition rebuild tasks are triggered in the following two scenarios:
- When a vector index search statement is executed.
- During scheduled checks, which can be manually configured to run at specified intervals.
Configure the execution interval
In the
oceanbasedatabase, set the vector_index_optimize_duty_time parameter to define the execution interval. Here is an example:ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';After this configuration, partition rebuild tasks will only run between 23:00:00 and 24:00:00. For more details, refer to the parameter description in the corresponding documentation.
View task progress and history
You can check the progress and history of tasks by querying the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY views.
Use the
statusfield to determine the current status of a task:- 0 (PREPARE): The task is waiting to be executed.
- 1 (RUNNING): The task is being executed.
- 2 (PENDING): The task has been paused.
- 3 (FINISHED): The task has completed.
Tasks that have completed execution (status
FINISHED) are stored in the history table, regardless of whether they were successful. For more details, refer to the corresponding view documentation.Cancel a task
To cancel a task, first obtain the trace_id from the
DBA_OB_VECTOR_INDEX_TASKSorCDB_OB_VECTOR_INDEX_TASKSview, and then execute the following command:ALTER SYSTEM CANCEL TASK <trace_id>;Here is an example:
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";