This topic describes the syntax and provides examples for creating, searching, and deleting IVF indexes in OceanBase Database.
Syntax and description
Create
The IVF series of indexes includes IVF and IVF_PQ indexes. You can create an IVF or IVF_PQ index when you create a table or after the table is created. When you create an index, note the following:
- The
VECTORkeyword must be specified. - The parameters and descriptions for creating an index after the table is created are the same as those for creating an index when the table is created.
- If the data volume is large, we recommend that you write all data to the table and then create the index to achieve the best search performance.
- We recommend that you create an IVF or IVF_PQ index after you write data to the table and rebuild the index after you write a large amount of incremental data. For more information about how to create an index, see the examples in the following sections.
- When you create an IVF index, the index name must be no longer than 33 characters. Otherwise, an exception may be returned because the index auxiliary table name exceeds the
index_namelimit. In later versions, the index name can be longer than 33 characters.
Syntax for creating an index when you create a table:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);
Syntax for creating an index after the table is created:
-- You can specify the parallelism to improve the index construction performance. The maximum value of the parallelism is twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
The following table describes the param parameters.
Parameter |
Default Value |
Value Range |
Required |
Description |
Notes |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | ivf_flat/ivf_pq | Yes | The IVF index type. | ||
| lib | ob | ob | No | The vector index library type. | |
| nlist | 128 | [1,65536] | No | The number of cluster centers. | |
| sample_per_nlist | 256 | [1,int64_max] | Yes | The number of samples for each cluster center. This parameter is used for creating an index after the table is created. | |
| nbits | 8 | [1,24] | No | The number of bits for quantization.
NoticeThis parameter is supported starting from V4.4.1. You can specify this parameter only when you create an IVF_PQ index. |
We recommend that you set this parameter to 8 and specify a value in the range of [8, 10]. The larger the value, the higher the quantization accuracy and the higher the search accuracy. However, the search performance will be affected. |
| m | No default value. Must be specified. | [1,65536] | Yes | The dimension of the quantized vector.
NoticeThis parameter is supported starting from V4.4.1. You can specify this parameter only when you create an IVF_PQ index. |
The larger the value, the slower the index construction and the higher the search accuracy. However, the search performance will be affected. |
Search
The search of an IVF series of indexes is an approximate nearest neighbor (ANN) search. The results are not guaranteed to be accurate. The recall rate is used to measure the accuracy. For example, if you query for the 10 nearest neighbors and 9 of them are returned, the recall rate is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num) [PARAMETERS ($param1=$value1, ...)];
Syntax requirements:
- You must specify the
APPROXIMATE/APPROXkeyword. Otherwise, a full table scan is performed instead of using the vector index. - You must specify the
ORDER BYandLIMITclauses. - The
ORDER BYclause can contain only one vector condition. - The value of
LIMIT + OFFSETmust be in the range of(0, 16384]. - If you do not specify the
LIMITclause, an error is returned.
Rules for using distance functions:
- If you specify the
APPROXIMATE/APPROXkeyword, the search uses the vector index only if the distance function is supported in the current version and is compatible with the vector index algorithm. - If you specify the
APPROXIMATE/APPROXkeyword, the search does not use the vector index if the distance function is not compatible with the vector index algorithm. However, no error is returned. - If you specify the
APPROXIMATE/APPROXkeyword, the search does not use the vector index and an error is returned if the distance function is not supported in the current version. - If you do not specify the
APPROXIMATE/APPROXkeyword, the search does not use the vector index. However, no error is returned.
Notice
Starting from V4.4.2 BP1, you can use the PARAMETERS(similarity=$value) clause to specify a similarity threshold for search.
- Similarity threshold parameter (
similarity):- Syntax: Add
PARAMETERS (similarity=$value)after theLIMITclause, where$valueis the similarity threshold (in the range of[0, 1]). Generally, the larger the similarity value, the more similar the vectors are, and the smaller the distance. Forinner_product, the larger the distance, the more similar the vectors are. - Function: After you specify the similarity threshold, only the results with a similarity value greater than or equal to the threshold are returned.
- Supported index types: IVF and IVF_PQ.
- Supported distance types:
- You can specify the similarity for
cosine_distanceandl2_distanceindexes. Forl2_distance, we recommend that you normalize the vectors to improve the accuracy of similarity search. For more information about vector normalization, see Use SQL functions - Vector normalization. - You cannot specify the similarity for
inner_productindexes. Otherwise, an errornot supportis returned.
- You can specify the similarity for
- Correspondence between similarity and distance:
cosine_distance=2 - 2 * similarityl2_distance=sqrt(1 / similarity -1)
- Syntax: Add
Other considerations:
- The
WHEREclause is used as a filter for the vector index search. - The recall rate is affected by the index construction parameters and search parameters.
- The index search parameters are specified when you create an index and cannot be modified later. However, you can set the number of cluster centers for search by using the session variable
ob_ivf_nprobes. If you set this variable, its value takes precedence. For more information, see ob_ivf_nprobes.
Drop
Syntax for dropping a vector index:
DROP INDEX index_name ON table_name;
Create, search, and delete examples
Create during table creation
IVF example
CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));
Create after table creation
IVF example
Create a test table.
CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an IVF index.
CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);
Delete
DROP INDEX vec_idx1 ON vec_table;
View the deleted index.
SHOW INDEX FROM vec_table;
The returned result is as follows:
Empty set
