This topic describes how to create, search, and drop the IVF series dense vector index in OceanBase Database, with syntax and examples.
Index syntax and description
Create
The IVF series includes two index types: IVF (ivf_flat) and IVF_PQ. You can create them with the table or after the table exists. When creating an index, keep the following in mind:
- You must use the
VECTORkeyword when creating a vector index. - Parameters for creating an index after the table exists are the same as when creating the index with the table.
- For large datasets, write your data first and then create the index for best search performance.
- For both IVF and IVF_PQ, create the index after loading data and rebuild the index after loading a large amount of incremental data. See the examples below for each index type.
- Index names must be 33 characters or fewer. Longer names can cause errors when the auxiliary table name exceeds the
index_namelimit. Support for longer names is planned for a future release.
Syntax:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);
Syntax:
-- You can set parallelism to speed up index build. Maximum parallelism is 2 × CPU core count.
CREATE [/*+ parallel $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
Parameter reference
| Parameter | Default | Range | Required | Description | Notes |
|---|---|---|---|---|---|
| distance | l2 / inner_product / cosine | Yes | Vector distance algorithm. | l2: Euclidean. inner_product: inner product. cosine: cosine distance. |
|
| type | ivf_flat / ivf_pq | Yes | IVF index type. | ||
| lib | ob | ob | No | Vector index library. | |
| nlist | 128 | [1, 65536] | No | Number of cluster centers. | |
| sample_per_nlist | 256 | [1, int64_max] | Yes | Number of samples per cluster center. Used when creating the index after the table exists. | |
| nbits | 8 | [1, 24] | No | Quantization bits.
NoteSupported from V4.4.1. IVF_PQ only. |
Recommended: 8; range [8, 10]. Higher values improve quantization and recall but can reduce search performance. |
| m | — (must specify) | [1, 65536] | Yes | Dimension of the quantized vector.
NoteSupported from V4.4.1. IVF_PQ only. |
Larger values slow index build and improve recall but can reduce search performance. |
Search
Search with the IVF series index is approximate nearest neighbor (ANN) search; it does not guarantee exact results. Accuracy is measured by recall. For example, if a query asks for 10 nearest neighbors and 9 of the returned rows are correct, recall is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num);
Search syntax
- You must specify
APPROXIMATEorAPPROXfor the query to use the vector index instead of a full table scan. - You must include
ORDER BYandLIMIT. ORDER BYsupports only one vector expression.LIMIT + OFFSETmust be in the range (0, 16384].- Omitting
LIMITcauses an error.
Distance functions
- With
APPROXIMATE/APPROX: if the distance function is supported and matches the index algorithm, the vector index is used. - With
APPROXIMATE/APPROX: if the distance function does not match the index algorithm, the vector index is not used and no error is returned. - With
APPROXIMATE/APPROX: if the distance function is not supported, the vector index is not used and an error occurs. - Without
APPROXIMATE/APPROX: the vector index is not used and no error is returned.
Other notes
WHEREconditions are applied as filters in addition to the vector index search.- Recall depends on both build and search parameters.
- Search parameters are set when creating the index and cannot be changed later. You can, however, set the number of cluster centers used at search time with the session variable
ob_ivf_nprobes; if set, it takes precedence. For details, see ob_ivf_nprobes.
Drop
To drop a vector index:
DROP INDEX index_name ON table_name;
Create, search, and drop examples
Create with table
IVF
CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));
Create after table
IVF
Create a test table:
CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));
Create the IVF index:
CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);
Drop
DROP INDEX vec_idx1 ON vec_table;
Verify the index was dropped:
SHOW INDEX FROM vec_table;
Example result:
Empty set