IVF series indexes|V4.3.5| docs|Distributed Database

IVF series indexes

Last Updated：2026-01-20 12:13:24 Updated

This topic describes how to create, search, and delete IVF series dense indexes in OceanBase Database, including syntax details and practical examples.

Index syntax and descriptions

Create indexes

The IVF series includes two types of indexes: IVF and IVF_PQ. You can create these either when you create the table or after the table is created. Keep the following points in mind:

You must specify the VECTOR keyword when you create a vector index.
The parameters and descriptions of an index created after you create a table are the same as those of an index created when you create a table.
If the data volume is large, we recommend that you write all the data and then create an index to obtain the best search performance.
We recommend that you create an IVF or IVF_PQ index after you write data to the table and rebuild the index after you write a large amount of incremental data. For more information about how to create an index, see the specific examples in the following sections.
When you create an IVF index, the index name must be less than 33 characters in length. Otherwise, an error may occur because the index auxiliary table name exceeds the index_name limit. In later versions, the index name can be longer than 33 characters.

Create an index during table creation

Create an index after table creation

Syntax for creating an index during table creation:

CREATE TABLE table_name (
    column_name1 data_type1,
    column_name2 VECTOR(dim),
    ...,
    VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);

Syntax for creating an index after table creation:

-- You can set the parallelism to improve the index construction performance. The maximum value of the parallelism is the number of CPU cores multiplied by 2.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);

param description:

Parameter	Default value	Value range	Required	Description	Remarks
distance		l2/inner_product/cosine	Yes	The vector distance algorithm.	l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance.
type		ivf_flat/ivf_pq	Yes	The type of the IVF index.
lib	ob	ob	No	The type of the vector index library.
nlist	128	[1,65536]	No	The number of cluster centers.	V4.3.5 BP5: If the business data volume is less than 1 million, we recommend that you set the value to `data_volume / 1000`. If the business data volume is more than 1 million, we recommend that you set the value to `sqrt(data_volume)`. For a partitioned table, `data_volume` is the data volume of the partition with the largest data volume.
sample_per_nlist	256	[1,int64_max]	Yes	The number of data samples for each cluster center. This parameter is used for an index created after you create a table.	V4.3.5 BP5: We recommend that you keep the parameter value as the default value and do not modify it.
nbits	8	[1,24]	No	The number of bits for quantization. Notice This parameter is supported in V4.3.5 BP3 and later. You can specify this parameter only when you create an IVF_PQ index.	We recommend that you set the value to 8 and keep the value in the range of [8, 10]. A larger value indicates a higher quantization accuracy and a higher search accuracy, but may affect the search performance.
m	No default value. Must be specified.	[1,65536]	Yes	The dimension of the quantized vector. Notice This parameter is supported in V4.3.5 BP3 and later. You can specify this parameter only when you create an IVF_PQ index.	A larger value indicates a higher search accuracy, but may affect the search performance.

For IVF and IVF_PQ indexes (V4.3.5 BP5 only), the following additional notes apply:

Limitations:
- You cannot create an IVF or IVF_PQ index in a scenario where a heap table and a partitioned table are used together.
Recommendations:
- We strongly recommend that you create an IVF or IVF_PQ index after you import data. We do not recommend that you create an IVF index when you create a table. If no data is available when you create an index, the index cannot be clustered and cannot be used. In this case, you must manually execute the REBUILD INDEX or DROP statement and then execute the CREATE INDEX statement again.
- We recommend that you estimate and query the memory usage before you create an index. For more information, see Manage index memory. This way, you can avoid the memory shortage issue during index creation.
- We recommend that you execute the ALTER SYSTEM major freeze statement and wait for the major freeze to complete before you create an index.
- In a multi-node cluster, we recommend that you set the memory of the SYS tenant to more than 1 GB.
- We recommend that you set the parallel_servers_target parameter to max_cpu * 10 before you create an index.
- If the data volume of a single partition is more than 10 million and the CPU resources of the tenant exceed 10 cores, we recommend that you modify the sampling ratio by executing the ALTER SYSTEM SET _px_object_sampling = 5000 statement before you create an index. This way, the index construction efficiency can be improved.

Search for indexes

The IVF series indexes use approximate nearest neighbor (ANN) search, which means results are not guaranteed to be 100% accurate. The main metric for accuracy is recall. For example, if you request the 10 nearest neighbors and consistently get 9 correct, the recall rate is 90%.

SELECT ... FROM table_name 
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX] 
LIMIT num (OFFSET num)
[PARAMETERS (param1=value1, ...)];

Search syntax requirements:

You must include the APPROXIMATE or APPROX keyword for the query to use the vector index instead of scanning the entire table.
Both ORDER BY and LIMIT clauses are required.
Only one vector condition is supported in the ORDER BY clause.
The sum of LIMIT and OFFSET must be within the range (0, 16384].
Omitting the LIMIT clause will cause an error.
The PARAMETERS clause is used to specify search parameters. For an IVF index, the nprobes parameter is supported. For example, PARAMETERS(nprobes=200). This parameter is supported starting from V4.3.5 BP5 and specifies the number of cluster centers to be used during the search.

Rules for using distance functions:

If you specify APPROXIMATE/APPROX, use a distance function supported by the current version, and it matches the vector index algorithm, the search will use the vector index.
If you specify APPROXIMATE/APPROX but the distance function does not match the vector index algorithm, the search will not use the vector index, but no error will occur.
If you specify APPROXIMATE/APPROX with an unsupported distance function, the search will not use the vector index and you will get an error.
If you do not specify APPROXIMATE/APPROX, even supported distance functions will not use the vector index, but no error will occur.

Additional notes:

The WHERE clause acts as a filter during vector index searches.
Recall rate is affected by both index build parameters and search parameters.
Search parameters for the index are set when the index is created and cannot be changed later. However, you can use the session variable ob_ivf_nprobes to set the number of cluster centers for searching. If this session variable is set, its value takes precedence. For details, see ob_ivf_nprobes.
Tips for optimizing IVF/IVF_PQ index searches (V4.3.5 BP5 only):
- Using compressed row format (ROW_FORMAT=COMPRESSED) when you create the table can improve IVF search performance by 20–30%. For example:
```
CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 VECTOR(4)) ROW_FORMAT=COMPRESSED;
```
- Parallel search is not recommended, as it may reduce performance or return incorrect results.

Notice

Starting with V4.3.5 BP5, you can use the PARAMETERS(similarity=$value) clause to specify a similarity threshold for searches.

Similarity threshold parameter (similarity):
- Syntax: Add PARAMETERS (similarity=$value) after the LIMIT clause, where $value is the similarity threshold (range: [0, 1]). In most cases, a higher similarity value means greater similarity and a smaller distance. The exception is inner_product, where a larger distance indicates greater similarity.
- Feature: When you specify a similarity threshold, only results with similarity greater than or equal to that threshold will be returned.
- Supported IVF index types: IVF, IVF_PQ
- Supported distance types:
  - For searches, you can specify the similarity threshold for indexes using cosine_distance and l2_distance. For l2_distance, it’s recommended to apply L2 normalization to vectors for more accurate similarity searches. For details and examples, see Using SQL functions - vector normalization.
  - Similarity thresholds are not supported for indexes using inner_product distance (an error will be returned).
- Relationship between similarity and distance:
  - cosine_distance = 2 - 2 * similarity
  - l2_distance = sqrt(1 / similarity - 1)

Delete an index

The syntax for deleting a vector index is as follows:

DROP INDEX index_name ON table_name;

Examples of creating, searching, and deleting indexes

Create an index during table creation

IVF example

CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));

Create an index after table creation

IVF example

Create a test table.

CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));

Create an IVF index.

CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);

Delete

DROP INDEX vec_idx1 ON vec_table;

Check the deleted index.

SHOW INDEX FROM vec_table;

The return result is as follows:

Empty set