This topic describes how to create, search, and drop the HNSW series dense vector index in OceanBase Database, with syntax and examples.
Index syntax and description
Create
The HNSW series includes three index types: HNSW, HNSW_SQ, and HNSW_BQ. You can create them with the table or after the table exists. When creating an index, keep the following in mind:
- You must use the
VECTORkeyword when creating a vector index. - Parameters for creating an index after the table exists are the same as when creating the index with the table.
- For large datasets, write your data first and then create the index for best search performance.
- For HNSW_SQ, create the index after loading data and rebuild the index after loading a large amount of incremental data. See the examples below for each index type.
- Index names must be 25 characters or fewer. Longer names can cause errors when the auxiliary table name exceeds the
index_namelimit. Support for longer names is planned for a future release. - For best search performance, create HNSW series indexes on heap tables.
Syntax:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);
Syntax:
-- You can set parallelism to speed up index build. Maximum parallelism is 2 × CPU core count.
CREATE [/*+ parallel $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
Parameter reference
| Parameter | Default | Range | Required | Description | Remarks |
|---|---|---|---|---|---|
| distance | l2 / inner_product / cosine | Yes | Vector distance algorithm. | l2: Euclidean distance. inner_product: inner product. cosine: cosine distance. |
|
| type | hnsw, hnsw_sq, or hnsw_bq |
Yes | Index type. | ||
| lib | vsag | vsag | No | Vector index library. | Only VSAG is supported. |
| m | 16 | [5, 128] | No | Maximum number of neighbors per node. | Larger values slow index build but improve search performance. |
| ef_construction | 200 | [5, 1000] | No | Size of the candidate set during index build. | Larger values slow build but improve index quality. Must be greater than m. |
| ef_search | 64 | [1, 1000] | No | Size of the candidate set during search. | Larger values slow search but improve recall. |
| extra_info_max_size | 0 | [0, 16384] | No | Maximum size (bytes) of primary key information stored in the index to speed up search. | 0: Do not store primary key info.1: Store primary key info (ignore size limit). Table primary key type must be a supported type.> 1: Maximum size in bytes. Requirements:
|
| refine_k | 4.0 | [1.0, 1000.0] | No |
NoteHNSW_BQ only. |
Can be set at index creation or at query time. Query-time value overrides the build-time value if both are set. |
| refine_type | sq8
NoteDefault is fp32 if the database was upgraded from a version earlier than V4.4.2. |
sq8 / fp32 | No |
NoteHNSW_BQ only. |
Reduces memory and build time but may lower recall. |
| bq_bits_query | 32 | 0 / 4 / 32 | No |
NoteHNSW_BQ only. |
Reduces memory and build time but may lower recall. |
| bq_use_fht | true
NoteDefault is false if the database was upgraded from V4.3.5 BP2 to V4.4.2. |
true / false | No |
NoteHNSW_BQ only. |
The primary key types supported by extra_info_max_size are:
- Numeric types: integer types, floating-point types, and BIT.
- Date and time types
- String types: VARCHAR.
How primary key size is calculated
SET @table_name = 'test'; -- Replace with your table name
SELECT
CASE
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
ELSE COALESCE(SUM(result_value), 'not support')
END AS extra_info_size
FROM (
SELECT
CASE
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- 8 bytes for numeric types
WHEN oc.data_type = 22 THEN oc.data_length -- varchar: use data_length
ELSE NULL -- unsupported type
END AS result_value
FROM
oceanbase.__all_column oc
JOIN
oceanbase.__all_virtual_data_type vdt
ON
oc.data_type = vdt.data_type
WHERE
oc.rowkey_position != 0
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;
-- Example result: 8 bytes
Search
Search with the HNSW series index is approximate nearest neighbor (ANN) search; it does not guarantee exact results. Accuracy is measured by recall. For example, if a query asks for 10 nearest neighbors and 9 of the returned rows are correct, recall is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num);
Search syntax
- You must specify
APPROXIMATEorAPPROXfor the query to use the vector index instead of a full table scan. - You must include
ORDER BYandLIMIT. ORDER BYsupports only one vector expression.LIMIT + OFFSETmust be in the range (0, 16384].- Omitting
LIMITcauses an error.
Distance functions
- With
APPROXIMATE/APPROX: if the distance function is supported and matches the index algorithm, the vector index is used. - With
APPROXIMATE/APPROX: if the distance function does not match the index algorithm, the vector index is not used and no error is returned. - With
APPROXIMATE/APPROX: if the distance function is not supported, the vector index is not used and an error occurs. - Without
APPROXIMATE/APPROX: the vector index is not used and no error is returned.
Other notes
WHEREconditions are applied as filters in addition to the vector index search.- Recall depends on both build and search parameters.
- Search parameters can be set when creating the index. To change them later, use DBMS_VECTOR.REBUILD_INDEX for table-level settings or specify the query-level parameter
ef_searchin the query.
Drop
To drop a vector index:
DROP INDEX index_name ON table_name;
Create, search, and drop examples
Create with table
HNSW
Create a test table:
CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag), VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));
Insert test data:
INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
Run approximate nearest neighbor search:
SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;
Example result:
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0 | c2 | c3 |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set
HNSW_SQ
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
HNSW_BQ
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));
HNSW_BQ supports the l2 and cosine distance algorithms.
Create after table
HNSW
Create a test table:
CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));
Create the HNSW index:
CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);
Show the table definition:
SHOW CREATE TABLE vec_table_hnsw;
Example output:
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
`id` int(11) DEFAULT NULL,
`c2` VECTOR(10) DEFAULT NULL,
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
Show the index:
SHOW INDEX FROM vec_table_hnsw;
Example output:
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set
HNSW_SQ
Create a test table:
CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));
Create the HNSW_SQ index:
CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ supports the l2 and cosine distance algorithms.
Drop
DROP INDEX vec_idx1 ON vec_table;
Verify the index was dropped:
SHOW INDEX FROM vec_table;
Example result:
Empty set