This topic describes the syntax and examples of creating, searching, and dropping HNSW indexes in OceanBase Database.
Syntax and description of the index
Create
The HNSW series index includes three types of indexes: HNSW, HNSW_SQ, and HNSW_BQ. You can create an HNSW series index in two ways: create it when you create a table or create it later. When you create an HNSW series index, note the following:
- You must specify the
VECTORkeyword. - The parameters and descriptions for creating an HNSW series index later are the same as those for creating it when you create a table.
- If the amount of data is large, we recommend that you write the data and then create the index to achieve the best search performance.
- We recommend that you create an HNSW_SQ index after you write the data and rebuild the index after you write a large amount of incremental data. For more information about how to create an HNSW series index, see the following examples.
- When you create an HNSW index, the index name must be no longer than 25 characters. Otherwise, an error may occur because the index auxiliary table name exceeds the
index_namelimit. In later versions, the index name can be longer. - We recommend that you create an HNSW series index on a heap table to achieve the best search performance.
Syntax:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);
Syntax:
-- You can set the parallelism to improve the index construction performance. The maximum value of the parallelism cannot exceed twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
The following table describes the param parameters.
| Parameter | Default value | Value range | Required | Description | Notes |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | hnsw/hnsw_sq/hnsw_bq |
Yes | The index type. | ||
| lib | vsag | vsag | No | The vector index library type. | Only the VSAG vector library is supported. |
| m | 16 | [5,128] | No | The maximum number of neighbors for each node. | A larger value indicates a slower index construction but better search performance. |
| ef_construction | 200 | [5,1000] | No | The size of the candidate set during index construction. | A larger value indicates a slower index construction but better index quality. ef_construction must be greater than m. |
| ef_search | 64 | [1,1000] | No | The size of the candidate set during search. | A larger value indicates a slower search but higher recall rate. |
| extra_info_max_size | 0 | [0,16384] | No | The maximum size of each primary key (unit: bytes). The primary keys are stored in the index to speed up the search. | 0: Do not store the primary key information.1: Forcefully store the primary key information and ignore the size limit. In this case, the primary key type must be a supported type (see below).Greater than 1: The maximum size of the primary key information (unit: bytes). In this case, the following conditions must be met:
|
| refine_k | 4.0 | [1.0,1000.0] | No |
NoticeThis parameter is applicable only to HNSW_BQ indexes. |
You can set this parameter when you create an index or specify it during a search:
|
| refine_type | sq8
NoticeIf the database is upgraded from a version earlier than V4.6.0 to V4.6.0, the default value of this parameter is fp32. |
sq8/fp32 | No |
NoticeThis parameter is applicable only to HNSW_BQ indexes. |
This value improves the efficiency by reducing the memory overhead and construction time during index construction, but may affect the recall rate. |
| bq_bits_query | 32 | 0/4/32 | No |
NoticeThis parameter is applicable only to HNSW_BQ indexes. |
This value improves the efficiency by reducing the memory overhead and construction time during index construction, but may affect the recall rate. |
| bq_use_fht | true
NoticeIf the database is upgraded from V4.3.5 BP2 to V4.6.0, the default value of this parameter is false. |
true/false | No |
NoticeThis parameter is applicable only to HNSW_BQ indexes. |
The following primary key types are supported for extra_info_max_size:
- Numeric types: Integer types, float types, and Bit_value types.
- Date and time types
- Character types: VARCHAR.
The size of the primary key information is calculated as follows:
SET @table_name = 'test'; -- Replace with the name of the table to be queried.
SELECT
CASE
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
ELSE COALESCE(SUM(result_value), 'not support')
END AS extra_info_size
FROM (
SELECT
CASE
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- The size of extra_info for numeric types is 8 bytes.
WHEN oc.data_type = 22 THEN oc.data_length -- The size of extra_info for varchar types is equal to the data_length.
ELSE NULL -- The size of extra_info for other types is not supported.
END AS result_value
FROM
oceanbase.__all_column oc
JOIN
oceanbase.__all_virtual_data_type vdt
ON
oc.data_type = vdt.data_type
WHERE
oc.rowkey_position != 0
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;
-- The result is 8 bytes.
Search
The search in the HNSW index series is an approximate nearest neighbor (ANN) search, and does not guarantee 100% accuracy in the search results. The recall rate is used to measure the accuracy of the search. For example, if 9 of the 10 nearest neighbors are returned stably, the recall rate is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num);
Syntax requirements for search:
- You must specify the
APPROXIMATE/APPROXkeyword. Otherwise, the search will not use the vector index but will instead perform a full table scan. - The
ORDER BYandLIMITclauses must be specified. - The
ORDER BYclause must specify only one vector condition. - The value of
LIMIT + OFFSETmust be in the range of(0, 16384]. - If you do not specify the
LIMITclause, an error is returned.
Usage rules for distance functions:
- If you specify the
APPROXIMATE/APPROXkeyword, the search will use the vector index. In this case, the distance function must be supported by the current version and must match the vector index algorithm. - If you specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index. However, no error is returned if the distance function does not match the vector index algorithm. - If you specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index and an error is returned if the distance function is not supported by the current version. - If you do not specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index. However, no error is returned if the distance function is supported by the current version.
Other considerations:
- The
WHEREclause specifies the filter conditions for the vector index search. - The recall rate is affected by the build parameters and search parameters.
- You can specify the index search parameters when you create the index. After the index is created, you can modify the table-level attributes by using DBMS_VECTOR.REBUILD_INDEX or specify the query-level parameter
ef_searchin the query.
Drop
The syntax for dropping a vector index is as follows:
DROP INDEX index_name ON table_name;
Create, search, and delete examples
Create during table creation
HNSW example
Create a test table.
CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag), VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));
Write test data.
INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
Perform approximate nearest neighbor search.
SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;
The result is as follows:
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0 | c2 | c3 |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set
HNSW_SQ example
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
HNSW_BQ example
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));
The distance parameter of the HNSW_BQ index supports l2 and cosine.
Create after table creation
HNSW example
Create a test table.
CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));
Create an HNSW index.
CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);
View the created table.
SHOW CREATE TABLE vec_table_hnsw;
The result is as follows:
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
`id` int(11) DEFAULT NULL,
`c2` VECTOR(10) DEFAULT NULL,
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
View the created index.
SHOW INDEX FROM vec_table_hnsw;
The result is as follows:
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set
HNSW_SQ example
Create a test table.
CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an HNSW_SQ index.
CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ example
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);
The distance parameter of the HNSW_BQ index supports l2 and cosine.
Delete
DROP INDEX vec_idx1 ON vec_table;
View the deleted index.
SHOW INDEX FROM vec_table;
The result is as follows:
Empty set
References
- For information about memory estimation and actual usage of HNSW index series, see Index memory management.
- For information about monitoring and maintenance of HNSW index series, see Index monitoring and maintenance.
