In OceanBase Database in MySQL mode, you can use the Vector Index Hint to control the selection of specific vector indexes and the setting of pre- and post-filtering conditions during queries.
Hint Name |
Description |
|---|---|
VECTOR_INDEX |
Specifies the vector index to be used during the query and the pre- and post-filtering conditions. |
Syntax
/*+ VECTOR_INDEX (table_name [, index_name] [, {'pre_filter' | 'post_filter'}]) */
The following table describes the parameters:
table_name: the name of the main table.index_name: the name of the vector index to be used. If this parameter is not specified, the optimizer selects an index.- You can specify the following filtering conditions. If this parameter is not specified, the optimizer selects a filtering method:
pre_filter: used to filter data before the vector index query.post_filter: used to filter data after the vector index query.
Examples
Create a test table
tand insert test data:CREATE TABLE t(c1 INT, c2 INT, c3 INT); INSERT INTO t VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9);Create a vector index
vec_idx1CREATE VECTOR INDEX vec_idx1 ON t(c2) WITH (distance=l2, type=hnsw);Use the
VECTOR_INDEXHint to specify the vector indexvec_idx1and the post-filtering condition for the query.EXPLAIN SELECT /*+VECTOR_INDEX(t vec_idx1 post_filter)*/ * FROM t WHERE c1 > 1 AND c2 > 2 ORDER BY l2_distance(cv, '[1,2,3]') APPROX LIMIT 1;The query result is as follows, indicating that the vector index
vec_idx1is used for the query and the post-filtering condition is applied:Query Plan ========================================================================================= |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| ----------------------------------------------------------------------------------------- |0 |VECTOR INDEX ADAPTIVE SCAN (POST-ITERATIVE-FILTER)|t(vec_idx1)|2 |15 | ========================================================================================= Outputs & filters: ------------------------------------- 0 - output([t.c1], [t.c2], [t.cv]), filter([t.c1 > 1], [t.c2 > 2]), rowset=16 access([t.__pk_increment], [t.cv], [t.c1], [t.c2]), partitions(p0) is_index_back=true, is_global_index=false, filter_before_indexback[true,false], range_key([t.c1], [t.__pk_increment]), range(1,MAX ; MAX,MAX)
Combination with INDEX Hint
This section briefly outlines the behavior rules when using the vector index Hint (VECTOR_INDEX) in combination with the general index Hint (INDEX), to optimize the application of actual hints:
- Only INDEX Hint (general index) is specified: pre-filtering is prioritized.
- Only INDEX Hint (vector index) is specified: post-filtering is prioritized, and the specified vector index is used.
- Only VECTOR_INDEX Hint is specified: the optimizer selects the specific index and filtering method based on the hint configuration.
- When both INDEX Hint and VECTOR_INDEX Hint are specified, the system prioritizes the VECTOR_INDEX Hint. If multiple VECTOR_INDEX Hints are specified, one of them is randomly selected to take effect.
The following examples illustrate the behavior rules when using the vector index Hint (VECTOR_INDEX) in combination with the general index Hint (INDEX).
Example table structure:
CREATE TABLE t(c1 INT, c2 INT, cv VECTOR(3));
CREATE INDEX idx1 ON t(c1);
CREATE INDEX idx2 ON t(c2);
CREATE VECTOR INDEX vec_idx1 ON t(cv) WITH (type=hnsw, distance=l2);
CREATE VECTOR INDEX vec_idx2 ON t(cv) WITH (type=hnsw_bq, distance=l2);
CREATE VECTOR INDEX vec_idx3 ON t(cv) WITH (type=hnsw, distance=cosine);
Here are several common usage scenarios:
Only INDEX Hint (general index) is specified
SELECT /*+INDEX(t idx1)*/ * FROM t ORDER BY l2_distance(cv, @vec) WHERE c1 < 1 AND c2 < 1 APPROX LIMIT 6;In this case, pre-filtering is prioritized.
Only INDEX Hint (vector index) is specified
SELECT /*+INDEX(t vec_idx2)*/ * FROM t ORDER BY l2_distance(cv, @vec) WHERE c1 < 1 AND c2 < 1 APPROX LIMIT 6;In this case, post-filtering is prioritized, and the specified vector index
vec_idx2is used.Only VECTOR_INDEX Hint (pre_filter) is specified
SELECT /*+VECTOR_INDEX(t vec_idx1 pre_filter)*/ * FROM t ORDER BY l2_distance(cv, @vec) WHERE c1 < 1 AND c2 < 1 APPROX LIMIT 6;In the previous example, the pre-filtering condition is specified for the vector index
vec_idx1. If no specific pre-filtering index is specified, the optimizer automatically selects an index from the available indexes on the relevant columns (such asc1andc2).Both INDEX Hint (general index) and VECTOR_INDEX Hint (pre_filter) are specified
SELECT /*+INDEX(t idx1) VECTOR_INDEX(t vec_idx1 pre_filter)*/ * FROM t ORDER BY l2_distance(cv, @vec) WHERE c1 < 1 AND c2 < 1 APPROX LIMIT 6;In this case, the pre-filtering condition specified by the VECTOR_INDEX Hint is prioritized, and the vector index
vec_idx1and the general indexidx1are used for pre-filtering.Both INDEX Hint (general index) and VECTOR_INDEX Hint (post_filter) are specified
SELECT /*+INDEX(t idx1) VECTOR_INDEX(t vec_idx1 post_filter)*/ * FROM t ORDER BY l2_distance(cv, @vec) WHERE c1 < 1 AND c2 < 1 APPROX LIMIT 6;In this case, the post-filtering condition specified by the VECTOR_INDEX Hint is prioritized. Note that in the current version, post-filtering cannot be used in combination with general indexes. Therefore, even if
index(t idx1)is specified, this Hint will not take effect, and only the post-filtering condition specified by the VECTOR_INDEX Hint for the vector indexvec_idx1will be applied.Both INDEX Hint (vector index) and VECTOR_INDEX Hint (pre_filter) are specified
SELECT /*+INDEX(t vec_idx2) VECTOR_INDEX(t vec_idx1 pre_filter)*/ * FROM t ORDER BY l2_distance(cv, @vec) WHERE c1 < 1 AND c2 < 1 APPROX LIMIT 6;In this case, the pre-filtering condition specified by the VECTOR_INDEX Hint is prioritized, and the INDEX Hint (vector index)
index(t vec_idx2)does not take effect.The distance algorithm specified by the VECTOR_INDEX Hint does not match the query algorithm
SELECT /*+INDEX(t vec_idx2) VECTOR_INDEX(t vec_idx3 pre_filter)*/ * FROM t ORDER BY l2_distance(cv, @vec) WHERE c1 < 1 AND c2 < 1 APPROX LIMIT 6;In this case, the VECTOR_INDEX Hint is invalid. If other INDEX Hints are specified, they will be selected based on the behavior rules outlined above. From the example table structure, we can see that the distance algorithm for
vec_idx3is cosine, while the query uses l2. Therefore, the VECTOR_INDEX Hint is invalid, and the post-filtering condition specified by the INDEX Hintindex(t vec_idx2)will take effect.Multiple VECTOR_INDEX Hints are specified with the same distance algorithm
SELECT /*+VECTOR_INDEX(t vec_idx1 pre_filter) VECTOR_INDEX(t vec_idx2 post_filter)*/ * FROM t ORDER BY l2_distance(cv, @vec) WHERE c1 < 1 AND c2 < 1 APPROX LIMIT 6;If all the distance algorithms are the same, the optimizer will select one of them to take effect, but it does not guarantee the order or which one is selected.
By following these rules, you can flexibly combine the VECTOR_INDEX and INDEX Hints to achieve optimal collaboration between vector indexes and general indexes.
