This topic describes how to monitor and maintain vector indexes in OceanBase Database.
Monitoring
You can view the basic information and real-time status of vector indexes in system views.
| View | Description |
|---|---|
| [G]V$OB_HNSW_INDEX_INFO | View the basic information and real-time status of HNSW indexes. |
| [G]V$OB_HNSW_INDEX_SEGMENT_INFO | View the basic information and real-time status of each data segment (Segment) in a partition of an HNSW index table. |
| [G]V$OB_IVF_INDEX_INFO | View the basic information and real-time status of IVF indexes. |
| [G]V$OB_SINDI_INDEX_INFO | View the basic information and real-time status of memory sparse indexes. |
Maintenance
If the amount of incremental data is too large, the search performance will be affected. To reduce the amount of data in the incremental data table, OceanBase Database introduces DBMS_VECTOR to maintain vector indexes.
Incremental refresh
Notice
This feature supports HNSW, semantic, and in-memory sparse indexes, but not IVF indexes.
Notice
If you use an asynchronous embedding mode for a semantic index, incremental refresh will trigger additional data embedding to ensure that the incremental data is correctly converted into vectors and added to the index.
If you write a large amount of data after you create an index, we recommend that you use the REFRESH_INDEX procedure to perform an incremental refresh. For more information, see REFRESH_INDEX.
The system checks for incremental data every 15 minutes. If the number of incremental data exceeds 10,000, an incremental refresh is automatically performed.
Full refresh (rebuild)
Manual full rebuild
If a large amount of data is updated or deleted after the index is created, we recommend that you use the REBUILD_INDEX procedure to perform a full refresh. For more information, see REBUILD_INDEX.
A full refresh is performed every 24 hours. If the new data exceeds 20% of the original data, a full refresh is automatically performed. The full refresh is performed in the background and asynchronously. First, a new index is created, and then the old index is replaced. During the rebuild, the old index remains available, but the overall process is relatively slow.
We also provide the vector_index_memory_saving_mode configuration item to control the memory usage during index rebuild. Enabling this mode can reduce the memory consumption during the rebuild of vector indexes in partitioned tables. Typically, a vector index rebuild requires twice the memory of the index. When the memory-saving mode is enabled, the system temporarily deletes the memory index of a partition after it is built to release memory, thereby effectively reducing the total memory required for the rebuild. For more information, see vector_index_memory_saving_mode.
Consider the following notes:
When you perform an offline DDL operation (such as
ALTER TABLEto modify the table structure or primary key), the index table is rebuilt. Since parallelism cannot be specified during index rebuild, the system uses a single thread by default. Therefore, when the data volume is large, the rebuild process is relatively slow, which affects the overall efficiency of the offline DDL operation.When you need to modify the index parameters during index rebuild, you must specify both
typeanddistancein the parameter list, and they must be consistent with the original index type. For example, if the original index type ishnswand the distance algorithm isl2, you must specify bothtype=hnswanddistance=l2during the rebuild.During index rebuild, the following operations are supported:
- Modify the values of
m,ef_search, andef_construction. - Online rebuild the
ef_searchparameter. - Rebuild the index type between
hnswandhnsw_sq. - Rebuild the index type between
ivf_flatandivf_flat, and betweenivf_pqandivf_pq. - Set the parallelism during the rebuild. For more information, see REBUILD_INDEX.
- Modify the values of
During index rebuild, the following operations are not supported:
- Modify the
typeanddistanceparameters. - Rebuild the index type between
hnswandivf. - Rebuild the index type between
hnswandhnsw_bq. - Rebuild the index type between
ivf_flatandivf_pq.
- Modify the
Automatic partition rebuild (recommended)
Notice
Notice
During automatic partition rebuilds, incremental data and snapshot tasks are processed simultaneously to ensure the consistency and integrity of the index data.
The current version triggers automatic partition rebuild tasks in the following two scenarios:
When you execute vector index search statements.
When a scheduled check is performed. You can manually configure the execution cycle.
In the
oceanbasedatabase, configure the execution cycle by using the vector_index_optimize_duty_time configuration item. Here is an example:ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]'; After the preceding configuration is completed, the partition rebuild task is executed only between 23:00:00 and 24:00:00. The task is not initiated during other periods. For more information about the parameters, see the corresponding configuration item documentation.
View the task progress and history
You can query the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY view to view the task progress and history.
Notice
The preceding views cover partition rebuild tasks, segment merge tasks, and segment merge dump tasks. You can distinguish the task types by using the
TASK_TYPEfield in the views.You can view the current status of a task by using the
statusfield:- 0 (PREPARE): The task is waiting to be executed.
- 1 (RUNNING): The task is being executed.
- 2 (PENDING): The task is paused.
- 3 (FINISHED): The task is completed.
The tasks with the
status=FINISHEDvalue are stored in the history table regardless of whether they are successful. For more information, see the corresponding view documentation.Cancel a task
To cancel a task, you can obtain the trace_id from the
DBA_OB_VECTOR_INDEX_TASKSorCDB_OB_VECTOR_INDEX_TASKSview and execute the following command:ALTER SYSTEM CANCEL TASK <trace_id>;Here is an example:
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";
Manually force a dump or major compaction (Alternative)
The dump and major compaction of incremental segments are usually automatically triggered by the background modules of the database system when specific conditions are met. OceanBase Database also allows you to manually force a dump or major compaction by using the FLUSH_INDEX and COMPACT_INDEX procedures in the DBMS_VECTOR system package. This enables you to perform maintenance tasks as needed.
For syntax and examples, see FLUSH_INDEX and COMPACT_INDEX.
After triggering, you can query the DBA_OB_VECTOR_INDEX_TASKS view. task_type=5 indicates a triggered dump task, and task_type=6 indicates a triggered major compaction task. These tasks may execute very quickly. If no records are found in the above view, you can check the historical task view DBA_OB_VECTOR_INDEX_TASK_HISTORY.
References
- Vector index performance optimization
- The auto-discard and major compaction of an incremental segment can be controlled by setting the threshold. For more information, see Vector index memory management.
