If the amount of incremental data is too large, search performance may be affected. To reduce the size of incremental data tables, OceanBase Database provides the DBMS_VECTOR package for maintaining vector indexes. This topic describes how to maintain vector indexes in OceanBase Database.
Incremental refresh
Notice
IVF/IVF_PQ indexes do not support incremental refresh.
If a large amount of data is written after an index is created, we recommend using the REFRESH_INDEX procedure to perform an incremental refresh. For details, see REFRESH_INDEX.
The system checks for incremental data every 15 minutes. If the number of new entries exceeds 10,000, an incremental refresh is triggered automatically.
Full refresh (rebuild)
Manual index rebuild
If a large amount of data is updated or deleted after an index is created, we recommend using the REBUILD_INDEX procedure for a full refresh. For details, see REBUILD_INDEX.
The system checks for a full refresh every 24 hours. If new data exceeds 20% of the original data, a full refresh is triggered automatically. The full refresh is performed asynchronously in the background: a new index is created and then replaces the old index. During the rebuild, the old index remains available, but the process can be relatively slow.
The vector_index_memory_saving_mode parameter can be used to control memory usage during index rebuild. Enabling this mode reduces memory consumption when rebuilding vector indexes on partitioned tables. Typically, rebuilding a vector index requires twice the index’s memory. When this mode is enabled, the system temporarily deletes the memory index of each partition after it is built, releasing memory and effectively reducing the total memory required for the rebuild. For more information, see vector_index_memory_saving_mode.
Considerations:
- Performing an offline DDL operation (such as
ALTER TABLEto modify table structure or primary key) will trigger an index rebuild. Parallelism cannot be specified during index rebuilds, so the system uses a single thread by default. For large datasets, this may slow the rebuild and affect offline DDL efficiency. - If you need to modify index parameters during a rebuild, you must specify both
typeanddistance, and their values must match the original index. For example, if the original index type ishnswand the distance algorithm isl2, you must specify bothtype=hnswanddistance=l2during the rebuild. - Supported during index rebuild:
- Modify
m,ef_search, andef_construction. - In V4.3.5 BP5, modify
nlist,sample_per_nlist,distance,nbits, andmfor IVF/IVF_PQ indexes (mandnbitsare unique to IVF_PQ). - Online modification of the
ef_searchparameter. - Rebuild between
hnswandhnsw_sqindex types. - Rebuild between
ivf_flat<->ivf_flat, andivf_pq<->ivf_pqindex types. - Set parallelism during rebuild. For more information, see REBUILD_INDEX.
- Modify
- Not supported during index rebuild:
- Modifying
typeordistance. - Rebuilds between
hnswandivf. - Rebuilds between
hnswandhnsw_bq. - Rebuilds between
ivf_flatandivf_pq.
- Modifying
- Recommendation for V4.3.5 BP5:
Before executingREBUILD INDEX, set theob_trx_timeoutparameter to a value greater than the expected rebuild time to avoid timeout issues.
Automatic partition rebuild (recommended)
Notice
Automatic partition rebuild tasks are triggered in the following scenarios:
- When you execute a vector index search statement.
- Periodically, based on a configurable schedule.
Configure the execution schedule
In the
oceanbasedatabase, use the vector_index_optimize_duty_time parameter to set the schedule. Example:ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';With this configuration, partition rebuild tasks are executed only between 23:00:00 and 24:00:00. For details, see the parameter documentation.
Check task progress and history
You can use the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY views to check the task progress and history.
The
statusfield indicates the current state of a task:- 0 (prepare): Waiting to execute.
- 1 (running): Executing.
- 2 (pending): Paused.
- 3 (finished): Completed.
Tasks with
status=finishedare stored in the history table. For more information, see the view documentation.Cancel a task
To cancel a task, obtain the
trace_idfrom theDBA_OB_VECTOR_INDEX_TASKSorCDB_OB_VECTOR_INDEX_TASKSview and execute:ALTER SYSTEM CANCEL TASK <trace_id>;Example:
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";