This topic describes the syntax and examples of creating, searching, and dropping HNSW indexes in OceanBase Database.
Syntax and description
Create
The HNSW series index includes HNSW, HNSW_SQ, and HNSW_BQ indexes. You can create an HNSW series index when you create a table or after the table is created. When you create an HNSW series index, note the following:
- The
VECTORkeyword is required. - The parameters and descriptions of an HNSW series index created after the table is created are the same as those of an HNSW series index created when the table is created.
- If the amount of data is large, we recommend that you write all the data and then create the index to achieve the best search performance.
- We recommend that you create an HNSW_SQ index after you write all the data and rebuild the index after you write a large amount of incremental data. For more information about how to create an HNSW series index, see the examples in the following sections.
- The name of an HNSW index cannot exceed 25 characters. Otherwise, an exception may be returned because the name of the index auxiliary table exceeds the
index_namelimit. In later versions, the index name can be longer. - We recommend that you create an HNSW series index on a heap table to achieve the best search performance.
Syntax:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);
Syntax:
-- You can set the parallelism to improve the index construction performance. The maximum parallelism cannot exceed twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
The following table describes the param parameters.
Parameter |
Default value |
Value range |
Required |
Description |
Notes |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The vector distance algorithm. | l2: Euclidean distance. inner_product: inner product distance. cosine: cosine distance. | |
| type | hnsw, hnsw_sq, or hnsw_bq |
Yes | The index type. | ||
| lib | vsag | vsag | No | The vector index library type. | Only the VSAG vector library is supported. |
| m | 16 | [5,128] | No | The maximum number of neighbors for each node. | A larger value indicates a slower index construction process but better search performance. |
| ef_construction | 200 | [5,1000] | No | The size of the candidate set during index construction. | A larger value indicates a slower index construction process but better index quality. ef_construction must be greater than m. |
| ef_search | 64 | [1,1000] | No | The size of the candidate set during search. | A larger value indicates a slower search process but higher recall rate. |
| extra_info_max_size | 0 | [0,16384] | No | The maximum size of the primary key information (unit: bytes). The primary key information is stored in the index to speed up the search. | 0: Do not store the primary key information.1: Force to store the primary key information and ignore the size limit. In this case, the primary key type must be a supported type. Greater than 1: The maximum size of the primary key information (unit: bytes). In this case, the following conditions must be met:
|
| refine_k | 4.0 | [1.0,1000.0] | No |
NoticeThis parameter is applicable only when you create an HNSW_BQ index. |
You can set this parameter when you create an index or specify it during search:
|
| refine_type | sq8
NoticeIf the database is upgraded from an earlier version to V4.4.2, the default value of this parameter is fp32. |
sq8/fp32 | No |
NoticeThis parameter is applicable only when you create an HNSW_BQ index. |
This value improves the efficiency by reducing the memory overhead and index construction time during index construction, but may affect the recall rate. |
| bq_bits_query | 32 | 0/4/32 | No |
NoticeThis parameter is applicable only when you create an HNSW_BQ index. |
This value improves the efficiency by reducing the memory overhead and index construction time during index construction, but may affect the recall rate. |
| bq_use_fht | true
NoticeIf the database is upgraded from V4.3.5 BP2 to V4.4.2, the default value of this parameter is false. |
true/false | No |
NoticeThis parameter is applicable only when you create an HNSW_BQ index. |
The following primary key types are supported for the extra_info_max_size parameter:
- Numeric types: integer, float, and Bit_value types.
- Date and time types
- Character types: VARCHAR.
The size of the primary key information is calculated as follows:
SET @table_name = 'test'; -- Replace with the actual table name.
SELECT
CASE
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
ELSE COALESCE(SUM(result_value), 'not support')
END AS extra_info_size
FROM (
SELECT
CASE
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- For numeric types, extra_info_size += 8.
WHEN oc.data_type = 22 THEN oc.data_length -- For varchar types, extra_info_size += data_length.
ELSE NULL -- Other types are not supported.
END AS result_value
FROM
oceanbase.__all_column oc
JOIN
oceanbase.__all_virtual_data_type vdt
ON
oc.data_type = vdt.data_type
WHERE
oc.rowkey_position != 0
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;
-- The calculation result is 8 bytes.
Search
The search in the HNSW series index is an approximate nearest neighbor (ANN) search, and it does not guarantee 100% accuracy in the results. The accuracy is measured by the recall rate. For example, if 10 nearest neighbors are queried and 9 correct results are consistently returned, the recall rate is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num) [PARAMETERS ($param1=$value1, ...)];
Syntax requirements for search:
- You must specify the
APPROXIMATE/APPROXkeyword. Otherwise, the search will not use the vector index and will instead perform a full table scan. - The
ORDER BYandLIMITclauses must be included. - The
ORDER BYclause can only specify a single vector condition. - The value of
LIMIT + OFFSETmust be in the range(0, 16384]. - If the
LIMITclause is not specified, an error will be returned.
Rules for using distance functions:
- If you specify the
APPROXIMATE/APPROXkeyword, the search will use the vector index if the distance function supported by the current version matches the vector index algorithm. - If you specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index if the distance function supported by the current version does not match the vector index algorithm. However, no error will be returned. - If you specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index and an error will be returned if the distance function is not supported by the current version. - If you do not specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index even if the distance function supported by the current version is specified. However, no error will be returned.
Notice
Starting from V4.4.2 BP1, you can use the PARAMETERS(similarity=$value) clause to specify the similarity threshold for search.
- Similarity threshold parameter (
similarity):- Syntax: Add
PARAMETERS (similarity=$value)after theLIMITclause, where$valuespecifies the similarity threshold in the range[0, 1]. Generally, the larger the similarity value, the more similar the vectors are, and the smaller the distance. This is an exception for theinner_productdistance function, where a larger distance indicates a higher similarity. - Function: After you specify the similarity threshold, only the results with similarity values greater than or equal to the threshold are returned.
- Supported index types: HNSW, HNSW_SQ, and HNSW_BQ.
- Supported distance types:
- During query, you can specify the similarity for the
cosine_distanceandl2_distancedistance types. For thel2_distancetype, we recommend that you normalize the vectors. This helps improve the accuracy of similarity search. For more information and examples, see Vector normalization. - During query, you cannot specify the similarity for the
inner_productdistance type. Otherwise, an errornot supportwill be returned.
- During query, you can specify the similarity for the
- Correspondence between similarity and distance:
cosine_distance=2 - 2 * similarityl2_distance=sqrt(1 / similarity -1)
- Syntax: Add
Other notes:
- The
WHEREclause specifies the filter conditions for vector index search. - The recall rate is affected by the build parameters and search parameters.
- Index search parameters can be set when the index is created. If you need to adjust the parameters after the index is created, you can modify the table-level attributes by using DBMS_VECTOR.REBUILD_INDEX or specify the query-level parameter
ef_searchduring query.
Delete
The syntax for deleting a vector index is as follows:
DROP INDEX index_name ON table_name;
Create, search, and delete examples
Create at table creation
HNSW example
Create a test table.
CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag), VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));
Write test data.
INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
Use approximate nearest neighbor search.
SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;
The result is as follows:
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0 | c2 | c3 |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set
HNSW_SQ example
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
HNSW_BQ example
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));
The distance parameter of the HNSW_BQ index supports l2 and cosine.
Create after table creation
HNSW example
Create a test table.
CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));
Create an HNSW index.
CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);
View the created table.
SHOW CREATE TABLE vec_table_hnsw;
The result is as follows:
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
`id` int(11) DEFAULT NULL,
`c2` VECTOR(10) DEFAULT NULL,
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
View the created index.
SHOW INDEX FROM vec_table_hnsw;
The result is as follows:
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set
HNSW_SQ example
Create a test table.
CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an HNSW_SQ index.
CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ example
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);
The distance parameter of the HNSW_BQ index supports l2 and cosine.
Delete
DROP INDEX vec_idx1 ON vec_table;
View the deleted index.
SHOW INDEX FROM vec_table;
The result is as follows:
Empty set
