This topic explains how to create, search, and delete HNSW series dense indexes in OceanBase Database, including syntax and examples.
Notice
The HNSW_BQ index is supported starting from OceanBase Database V4.3.5 BP2.
Index syntax and descriptions
Create an index
The HNSW series includes three types of indexes: HNSW, HNSW_SQ, and HNSW_BQ. You can create these indexes either when you create the table or after the table is created. Please keep in mind:
- You must use the
VECTORkeyword when creating a vector index. - The parameters for creating an index after table creation are the same as those used during table creation.
- For large datasets, it is recommended to load all data first, then create the index to achieve optimal search performance.
- For HNSW_SQ indexes, it is best to build the index after inserting data, and to rebuild the index after significant incremental data updates. See specific examples below for details.
- When creating an HNSW index, the index name cannot exceed 25 characters due to limitations on auxiliary table name length. Longer names will be supported in future versions.
Syntax:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim), -- Vector column. The VECTOR type is required. The dimension field indicates the vector dimension.
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);
Syntax:
-- You can set the parallelism to speed up index building; maximum is CPU cores × 2
CREATE [/*+ parallel $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
param description:
| Parameter | Default value | Value range | Required | Description | Remarks |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | hnsw/hnsw_sq/hnsw_bq | Yes | Specifies the index type. | ||
| lib | vsag | vsag | No | Specifies the vector index library. | Currently, only VSAG is supported. |
| m | 16 | [5,128] | No | Maximum number of neighbors per node. | Higher values slow down index building but improve search performance. |
| ef_construction | 200 | [5,1000] | No | Candidate pool size for index construction. | Higher values slow down index building but improve index quality. Must be greater than m. |
| ef_search | 64 | [1,1000] | No | Candidate pool size for searching. | Higher values slow down searching but improve recall. |
| extra_info_max_size | 0 | [0,16384] | No | Maximum size (in bytes) for primary key info stored in the index. | 0: Do not store PK info. 1: Always store PK info, ignore size limit (PK type must be supported). >1: Set max PK info size; PK type must be supported and PK size must be within limit. |
| refine_k | 4.0 | [1.0,1000.0] | No |
NoticeThis parameter is supported starting from V4.3.5 BP3 and can only be specified when creating an HNSW_BQ index. |
This parameter can be set when creating the index or during search:
|
| refine_type | sq8
NoticeIf the cluster was upgraded from an earlier version to V4.3.5 BP3, the default value for this parameter is fp32. |
sq8/fp32 | No |
NoticeThis parameter is supported starting from V4.3.5 BP3 and can only be specified when creating an HNSW_BQ index. |
This value increases efficiency by reducing memory usage and build time during index construction, but may affect recall. |
| bq_bits_query | 32 | 0/4/32 | No |
NoticeThis parameter is supported starting from V4.3.5 BP3 and can only be specified when creating an HNSW_BQ index. |
This value increases efficiency by reducing memory usage and build time during index construction, but may affect recall. |
| bq_use_fht | true
NoticeIf the cluster was upgraded from an earlier version to V4.3.5 BP3, the default value for this parameter is false. |
true/false | No |
NoticeThis parameter is supported starting from V4.3.5 BP3 and can only be specified when creating an HNSW_BQ index. |
The primary key types supported by extra_info_max_size include:
- Numeric types: supports integer types, floating-point types, and Bit_value types.
- Date and time types
- Character types: supports VARCHAR type.
Method for calculating the size of primary key information:
SET @table_name = 'test'; -- Replace with the table_name you want to query
SELECT
CASE
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
ELSE COALESCE(SUM(result_value), 'not support')
END AS extra_info_size
FROM (
SELECT
CASE
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- For numeric types, extra_info_size += 8
WHEN oc.data_type = 22 THEN oc.data_length -- For VARCHAR type, extra_info_size += data_length
ELSE NULL -- Other types are not supported
END AS result_value
FROM
oceanbase.__all_column oc
JOIN
oceanbase.__all_virtual_data_type vdt
ON
oc.data_type = vdt.data_type
WHERE
oc.rowkey_position != 0
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;
-- Calculation result: 8 bytes
:::
Search for indexes
The HNSW series indexes use approximate nearest neighbor (ANN) search, which means results are not guaranteed to be 100% accurate. The main metric for accuracy is recall. For example, if you request the 10 nearest neighbors and consistently get 9 correct, the recall rate is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num)
[PARAMETERS (param1=value1, ...)];
Search syntax requirements:
- You must include the
APPROXIMATEorAPPROXkeyword for the query to use the vector index instead of scanning the entire table. - Both
ORDER BYandLIMITclauses are required. - Only one vector condition is supported in the
ORDER BYclause. - The sum of
LIMITandOFFSETmust be within the range(0, 16384]. - Omitting the
LIMITclause will cause an error.
Rules for using distance functions:
- If you specify
APPROXIMATE/APPROX, use a distance function supported by the current version, and it matches the vector index algorithm, the search will use the vector index. - If you specify
APPROXIMATE/APPROXbut the distance function does not match the vector index algorithm, the search will not use the vector index, but no error will occur. - If you specify
APPROXIMATE/APPROXwith an unsupported distance function, the search will not use the vector index and you will get an error. - If you do not specify
APPROXIMATE/APPROX, even supported distance functions will not use the vector index, but no error will occur.
Additional notes:
- The
WHEREclause acts as a filter during vector index searches. - Recall rate is affected by both index build parameters and search parameters.
- Search parameters for the index are set when the index is created and cannot be changed later. However, you can use the session variable
ob_ivf_nprobesto set the number of cluster centers for searching. If this session variable is set, its value takes precedence. For details, see ob_ivf_nprobes.
Notice
Starting with V4.3.5 BP5, you can use the PARAMETERS(similarity=$value) clause to specify a similarity threshold for searches.
- Similarity threshold parameter (
similarity):- Syntax: Add
PARAMETERS (similarity=$value)after theLIMITclause, where$valueis the similarity threshold (range:[0, 1]). In most cases, a higher similarity value means greater similarity and a smaller distance. The exception isinner_product, where a larger distance indicates greater similarity. - Feature: When you specify a similarity threshold, only results with similarity greater than or equal to that threshold will be returned.
- Supported HNSW index types: HNSW, HNSW_SQ, HNSW_BQ
- Supported distance types:
- For searches, you can specify the similarity threshold for indexes using
cosine_distanceandl2_distance. Forl2_distance, it’s recommended to apply L2 normalization to vectors for more accurate similarity searches. For details and examples, see Using SQL functions - vector normalization. - Similarity thresholds are not supported for indexes using
inner_productdistance (an error will be returned).
- For searches, you can specify the similarity threshold for indexes using
- Relationship between similarity and distance:
cosine_distance=2 - 2 * similarityl2_distance=sqrt(1 / similarity - 1)
- Syntax: Add
Delete an index
The syntax for deleting a vector index is as follows:
DROP INDEX index_name ON table_name;
Examples of creating, searching, and deleting indexes
Create an index during table creation
HNSW example
Create a test table.
CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag), VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));
Write test data.
INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
Perform approximate nearest neighbor search.
SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;
The returned result is as follows:
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0 | c2 | c3 |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set
HNSW_SQ example
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
HNSW_BQ example
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));
The distance parameter for HNSW_BQ indexes supports both l2 and cosine. Support for cosine is available starting from V4.3.5 BP4.
Create an index after table creation
HNSW example
Create a test table.
CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));
Create an HNSW index.
CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);
Check the created table.
SHOW CREATE TABLE vec_table_hnsw;
The returned result is as follows:
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
`id` int(11) DEFAULT NULL,
`c2` VECTOR(10) DEFAULT NULL,
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
Check the created index.
SHOW INDEX FROM vec_table_hnsw;
The returned result is as follows:
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set
HNSW_SQ example
Create a test table.
CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an HNSW_SQ index.
CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ example
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);
The distance parameter for HNSW_BQ indexes supports both l2 and cosine. Support for cosine is available starting from V4.3.5 BP4.
Delete an index
DROP INDEX vec_idx1 ON vec_table;
Check the deleted index.
SHOW INDEX FROM vec_table;
The returned result is as follows:
Empty set