This topic describes the syntax and examples for creating, searching, and dropping IVF series indexes in OceanBase Database.
Syntax and description
Create
IVF series indexes include IVF and IVF_PQ indexes. The construction of the cluster centers of an IVF index depends on the data in the primary table. To ensure the normal use of the IVF index, you must create the primary table and import data before you create the IVF index. You can create an IVF index only by using the post-create method. When you create an IVF index, note the following:
- If the data volume is large, we recommend that you write data to the primary table and then create the index to achieve the best search performance.
- We recommend that you create an IVF or IVF_PQ index after you write data to the primary table, and rebuild the index after you write a large amount of incremental data. For more information about how to create an index, see the examples in the following sections.
- The length of the index name must be less than or equal to 33 characters. Otherwise, an exception may be returned because the name of the auxiliary table of the index exceeds the
index_namelimit. In a later version, the length of the index name will be supported. - When you process large-scale vector data, you can increase the
nlistparameter to improve the recall rate of the IVF index. Whennlistis greater than or equal to 5000, the storage structure of the cluster centers of the IVF index will be changed from an array to a graph structure to further optimize the index performance and space utilization.
Syntax for post-create indexes:
-- You can specify the parallelism to improve the index construction performance. The maximum value of the parallelism is twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
The following table describes the param parameters.
| Parameter | Default value | Value range | Required | Description | Remarks |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | ivf_flat/ivf_pq | Yes | The type of the IVF index. | ||
| lib | ob | ob | No | The vector index library. | |
| nlist | 128 | [1,65536] | No | The number of cluster centers. | |
| sample_per_nlist | 256 | [1,int64_max] | Yes | The number of samples for each cluster center. This parameter is used for post-create indexes. | |
| nbits | 8 | [1,24] | No | The number of bits for quantization.
NoticeThis parameter is applicable only to IVF_PQ indexes. |
We recommend that you set this parameter to 8. We recommend that you set this parameter to a value in the range of [8, 10]. A larger value indicates higher quantization accuracy and higher search accuracy, but also lower search performance. |
| m | No default value. Must be specified. | [1,65536] | Yes | The dimension of the quantized vector.
NoticeThis parameter is applicable only to IVF_PQ indexes. |
A larger value indicates slower index construction and higher search accuracy, but also lower search performance. |
Search
IVF series indexes support approximate nearest neighbor (ANN) search. The search results are not guaranteed to be 100% accurate. The recall rate is used to measure the search accuracy. For example, if 9 out of 10 nearest neighbors can be returned, the recall rate is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num);
Syntax requirements:
- You must specify the
APPROXIMATEorAPPROXkeyword. Otherwise, the vector index will not be used for the search, and the full table scan will be performed. - You must specify the
ORDER BYandLIMITclauses. - The
ORDER BYclause can specify only one vector condition. - The value of
LIMIT + OFFSETmust be in the range of (0, 16384]. - If you do not specify the
LIMITclause, an error will be returned.
Rules for using distance functions:
- If you specify the
APPROXIMATEorAPPROXkeyword, the search will use the vector index only if the distance function is supported in the current version and is compatible with the vector index algorithm. - If you specify the
APPROXIMATEorAPPROXkeyword, the search will not use the vector index if the distance function is not compatible with the vector index algorithm. No error will be returned. - If you specify the
APPROXIMATEorAPPROXkeyword, the search will not use the vector index if the distance function is not supported in the current version. An error will be returned. - If you do not specify the
APPROXIMATEorAPPROXkeyword, the search will not use the vector index even if the distance function is supported in the current version. No error will be returned.
Other considerations:
- The
WHEREclause specifies the filter conditions for the vector index search. - The recall rate is affected by the construction parameters and search parameters.
- The index search parameters are specified when you create the index and cannot be modified later. However, you can set the number of cluster centers used for the search by using the session variable
ob_ivf_nprobes. If you set this session variable, it will be used for the search. For more information about how to set this session variable, see ob_ivf_nprobes.
Drop
Syntax for dropping a vector index:
DROP INDEX index_name ON table_name;
Examples
Post-create
Create a test table.
CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an IVF index.
CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);
Drop
DROP INDEX vec_idx1 ON vec_table;
Query the dropped index.
SHOW INDEX FROM vec_table;
The return result is as follows:
Empty set
References
- For information about how to query the estimated memory usage and actual memory usage of IVF series indexes, see Index memory management.
- For information about how to monitor and maintain IVF series indexes, see Index monitoring and maintenance.
- For information about how to optimize the performance of IVF series indexes, see Index performance optimization.
