This topic describes the vector functions supported by OceanBase Database and the considerations for using them.
Considerations
Vectors in different dimensions are not supported. An error
different vector dimensions %d and %dwill be returned when you perform operations on vectors in different dimensions.An error
value out of range: overflow / underflowwill be returned when the result is out of the floating-point number range.Vector indexes support L2 distance and inner product distance calculations. For more information, see Create a vector index.
Preparations
Before you use vector indexes, you must enable vector-based indexes by setting ob_vector_memory_limit_percentage.
ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;
Distance function
A distance function calculates the distance between two vectors based on the specific distance algorithm used.
L2_distance
The Euclidean distance measures the distance between coordinates of two vectors -- essentially the straight-line distance between two vectors. It is calculated using the Pythagorean theorem applied to vector coordinates:

The syntax is as follows:
l2_distance(vector v1, vector v2)
The parameters are described as follows:
Parameters can be of any type that can be forcibly converted to a vector type, including single-level array types (such as
[1,2,3,..]) and string types (such as'[1,2,3]'), except for the vector type.The dimensions of the two parameters must be the same.
If a single-level array type is used as a parameter, the parameter elements must not contain
NULL.
The return value is described as follows:
The return value is a
distance(double)value.If any parameter is
NULL, the return value isNULL.
Here is an example:
CREATE TABLE t1(c1 vector(3));
INSERT INTO t1 VALUES('[1,2,3]');
SELECT l2_distance(c1, [1,2,3]), l2_distance([1,2,3],[1,1,1]), l2_distance('[1,1,1]','[1,2,3]') FROM t1;
The return result is as follows:
+--------------------------+------------------------------+----------------------------------+
| l2_distance(c1, [1,2,3]) | l2_distance([1,2,3],[1,1,1]) | l2_distance('[1,1,1]','[1,2,3]') |
+--------------------------+------------------------------+----------------------------------+
| 0 | 2.23606797749979 | 2.23606797749979 |
+--------------------------+------------------------------+----------------------------------+
1 row in set
L1_distance
The Manhattan distance measures the sum of absolute differences of coordinates between two points in a standard coordinate system. The formula is as follows:

The syntax is as follows:
l1_distance(vector v1, vector v2)
The parameters are described as follows:
Parameters can be of any type that can be forcibly converted to a vector type, including single-level array types (such as
[1,2,3,..]) and string types (such as'[1,2,3]'), except for the vector type.The dimensions of the two parameters must be the same.
If a single-level array type is used as a parameter, the parameter elements must not contain
NULL.
The return value is described as follows:
The return value is a
distance(double)value.If any parameter is
NULL, the return value isNULL.
Here is an example:
CREATE TABLE t2(c1 vector(3));
INSERT INTO t2 VALUES('[1,2,3]');
INSERT INTO t2 VALUES('[1,1,1]');
SELECT l1_distance(c1, [1,2,3]) FROM t2;
The return result is as follows:
+--------------------------+
| l1_distance(c1, [1,2,3]) |
+--------------------------+
| 0 |
| 3 |
+--------------------------+
2 rows in set
Cosine_distance
The cosine similarity measures the angle difference between two vectors. It measures the directional similarity between two vectors regardless of their length (magnitude). The cosine similarity ranges from [-1, 1]. A value of 1 indicates that the two vectors are in the same direction. A value of 0 indicates that the two vectors are orthogonal. A value of -1 indicates that the two vectors are in opposite directions.
The cosine similarity is calculated using the following formula:

The cosine distance, which measures the distance between two vectors, is calculated by subtracting the cosine similarity from 1. The formula is as follows:

The cosine distance ranges from [0, 2]. A value of 0 indicates that the two vectors are in the same direction and there is no distance between them. A value of 2 indicates that the two vectors are in opposite directions.
The syntax is as follows:
cosine_distance(vector v1, vector v2)
The parameters are described as follows:
Parameters can be of any type that can be forcibly converted to a vector type, including single-level array types (such as
[1,2,3,..]) and string types (such as'[1,2,3]'), except for the vector type.The dimensions of the two parameters must be the same.
If a single-level array type is used as a parameter, the parameter elements must not contain
NULL.
The return value is described as follows:
The return value is a
distance(double)value.If any parameter is
NULL, the return value isNULL.
Here is an example:
CREATE TABLE t3(c1 vector(3));
INSERT INTO t3 VALUES('[1,2,3]');
INSERT INTO t3 VALUES('[1,2,1]');
SELECT cosine_distance(c1, [1,2,3]) FROM t3;
The return result is as follows:
+------------------------------+
| cosine_distance(c1, [1,2,3]) |
+------------------------------+
| 0 |
| 0.12712843905603044 |
+------------------------------+
2 rows in set
Inner_product
The inner product, also known as the dot product or scalar product, is a type of multiplication between two vectors. In geometry, the inner product indicates the directional and magnitude relationships between two vectors. The formula for calculating the inner product is as follows:

The syntax is as follows:
inner_product(vector v1, vector v2)
The parameters are described as follows:
Apart from vector types, other types that can be forcibly converted to vector types are supported. Examples of other types include single-level array types like
[1,2,3,...]and string types like'[1,2,3]'.The dimensions of the two parameters must be the same.
If the parameter contains single-level array types,
NULLvalues are not allowed in the elements of this parameter.
The return value is described as follows:
The return value is a
distance(double)value.If any parameter is
NULL, the return value isNULL.
Here is an example:
CREATE TABLE t4(c1 vector(3));
INSERT INTO t4 VALUES('[1,2,3]');
INSERT INTO t4 VALUES('[1,2,1]');
SELECT inner_product(c1, [1,2,3]) FROM t4;
The return result is as follows:
+----------------------------+
| inner_product(c1, [1,2,3]) |
+----------------------------+
| 14 |
| 8 |
+----------------------------+
2 rows in set
Negative_inner_product
The negative_inner_product function calculates the negative inner product between two vectors by using the following formula:

The syntax is as follows:
negative_inner_product(vector v1, vector v2)
The parameters are described as follows:
- Apart from vector types, other types that can be forcibly converted to vector types are supported. Examples of other types include single-level array types like
[1,2,3,...]and string types like'[1,2,3]'. - The dimensions of the two parameters must be the same.
- If the parameter contains single-level array types,
NULLvalues are not allowed in the elements of this parameter.
The return value is described as follows:
- The return value is a
distance(double)value. - If any parameter is
NULL, the return value isNULL.
Here is an example:
CREATE TABLE t4(c1 vector(3));
INSERT INTO t4 VALUES('[1,2,3]');
INSERT INTO t4 VALUES('[1,2,1]');
SELECT negative_inner_product(c1, [1,2,3]) FROM t4;
The return result is as follows:
+-------------------------------------+
| negative_inner_product(c1, [1,2,3]) |
+-------------------------------------+
| -14 |
| -8 |
| -14 |
| -8 |
+-------------------------------------+
4 rows in set
Vector_distance
The vector_distance function calculates the distance between two vectors by using different distance algorithms based on the specified parameters.
The syntax is as follows:
vector_distance(vector v1, vector v2 [, string metric])
The parameters are described as follows:
Apart from vector types, other types that can be forcibly converted to vector types are supported. Examples of other types include single-level array types like
[1,2,3,...]and string types like'[1,2,3]'.The dimensions of the two parameters must be the same.
If the parameter contains single-level array types,
NULLvalues are not allowed in the elements of this parameter.
The metric parameter specifies the distance algorithm. Valid values:
If the parameter is not specified, the default algorithm is
euclidean.If the parameter is specified, the value must be one of the following:
euclidean: the Euclidean distance, which is the same as L2_distance.manhattan: the Manhattan distance, which is the same as L1_distance.cosine: the cosine distance, which is the same as Cosine_distance.dot: the inner product, which is the same as Inner_product.
The return value is described as follows:
The return value is a
distance(double)value.If any parameter is
NULL, the return value isNULL.
Here is an example:
CREATE TABLE t5(c1 vector(3));
INSERT INTO t5 VALUES('[1,2,3]');
INSERT INTO t5 VALUES('[1,2,1]');
SELECT vector_distance(c1, [1,2,3], euclidean) FROM t5;
The return result is as follows:
+-----------------------------------------+
| vector_distance(c1, [1,2,3], euclidean) |
+-----------------------------------------+
| 0 |
| 2 |
+-----------------------------------------+
2 rows in set
Arithmetic functions
Arithmetic functions provide element-wise addition (+), subtraction (-), and multiplication (*) operations between vector types, between single-level array types, and between special string types. The operation results are as follows:

The syntax is as follows:
v1 + v2
v1 - v2
v1 * v2
The parameters are described as follows:
Except for vector types, parameters can be of any type that can be forcibly converted to a vector type, including single-level array types (such as
[1,2,3,..]) and string-like types (such as'[1,2,3]'). Note: the two parameters cannot both be string-like types; one of them must be a vector or single-level array type.The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, the elements of this parameter must not be
NULL.
The return values are described as follows:
If at least one of the two parameters is a vector type, the return value is a vector of the same type as the vector parameter.
If the two parameters are both single-level array types, the return value is of the
array(float)type.If any parameter is
NULL, the return value isNULL.
Here is an example:
CREATE TABLE t6(c1 vector(3));
INSERT INTO t6 VALUES('[1,2,3]');
SELECT [1,2,3] + '[1.12,1000.0001, -1.2222]', c1 - [1,2,3] FROM t6;
The return result is as follows:
+---------------------------------------+--------------+
| [1,2,3] + '[1.12,1000.0001, -1.2222]' | c1 - [1,2,3] |
+---------------------------------------+--------------+
| [2.12,1002,1.7778] | [0,0,0] |
+---------------------------------------+--------------+
1 row in set
Comparison functions
Comparison functions provide comparison between vector types, single-level array types, and special string types. It supports comparison operators such as =, !=, >, <, >=, and <=. The comparison is performed by comparing the elements in the order of dictionaries.
The syntax is as follows:
v1 = v2
v1 != v2
v1 > v2
v1 < v2
v1 >= v2
v1 <= v2
The parameters are described as follows:
Apart from vector types, other types that can be forcibly converted to vector types are allowed. This includes single-level array types, such as
[1,2,3,...], and string literal types, such as'[1,2,3]'.Notice
One of the two parameters must be a vector type.
The dimensions of the two parameters must be the same.
If one of the two parameters is a single-level array type,
NULLvalues are not allowed in the elements of this parameter.
The return value is described as follows:
The return value is of the bool type.
If any parameter is
NULL, the return value isNULL.
Example:
CREATE TABLE t7(c1 vector(3));
INSERT INTO t7 VALUES('[1,2,3]');
SELECT c1 = '[1,2,3]' FROM t7;
The return result is as follows:
+----------------+
| c1 = '[1,2,3]' |
+----------------+
| 1 |
+----------------+
1 row in set
Aggregate functions
Notice
Vector columns cannot be used as GROUP BY conditions, and the DISTINCT keyword is not supported.
Sum
The sum function calculates the sum of vector columns in a table. It performs element-by-element addition to obtain the sum vector.
The syntax is as follows:
sum(vector v1)
The parameters are described as follows:
- Only vectors are supported.
The return values are described as follows:
The return value is
sum (vector).NULLcolumns are not included in the summation result.
Here is an example:
CREATE TABLE t8(c1 vector(3));
INSERT INTO t8 VALUES('[1,2,3]');
SELECT sum(c1) FROM t8;
The return result is as follows:
+---------+
| sum(c1) |
+---------+
| [1,2,3] |
+---------+
1 row in set
Avg
The avg function calculates the average of vector columns in a table.
The syntax is as follows:
avg(vector v1)
The parameters are described as follows:
- Only vectors are supported.
The return values are described as follows:
The return value is
avg (vector).NULLcolumns are not included in the sum or count processes.If the input parameter is empty, the output is
NULL.
Here is an example:
CREATE TABLE t9(c1 vector(3));
INSERT INTO t9 VALUES('[1,2,3]');
SELECT avg(c1) FROM t9;
The return result is as follows:
+---------+
| avg(c1) |
+---------+
| [1,2,3] |
+---------+
1 row in set
Other commonly used vector functions
Vector_norm
The Vector_norm function is used to calculate the Euclidean norm (magnitude) of a vector, which represents the Euclidean distance between the vector and the origin. The calculation method is as follows:

The syntax is as follows:
vector_norm(vector v1)
The parameters are described as follows:
Apart from the vector type, other types that can be forcibly converted to the vector type are acceptable. These include single-level array types such as
[1,2,3,...]and string types such as'[1,2,3]'.When a single-level array type parameter is involved, its elements must not be
NULL.
The return values are described as follows:
Returns
norm(double).Returns
NULLwhen the parameter isNULL.
Here is an example:
CREATE TABLE t10(c1 vector(3));
INSERT INTO t10 VALUES('[1,2,3]');
SELECT vector_norm(c1),vector_norm([1,2,3]) FROM t10;
The return result is as follows:
+--------------------+----------------------+
| vector_norm(c1) | vector_norm([1,2,3]) |
+--------------------+----------------------+
| 3.7416573867739413 | 3.7416573867739413 |
+--------------------+----------------------+
1 row in set
Vector_dims
The Vector_dims function is used to return the dimension of a vector.
The syntax is as follows:
vector_dims(vector v1)
The parameters are described as follows:
- Apart from the vector type, other types that can be forcibly converted to the vector type are acceptable. These include single-level array types such as
[1,2,3,...]and string types such as'[1,2,3]'.
The return values are described as follows:
Returns
dims(int64).Returns an error when the parameter is
NULL.
Here is an example:
CREATE TABLE t11(c1 vector(3));
INSERT INTO t11 VALUES('[1,2,3]');
INSERT INTO t11 VALUES('[1,1,1]');
SELECT vector_dims(c1), vector_dims('[1,2,3]') FROM t11;
The return result is as follows:
+-----------------+------------------------+
| vector_dims(c1) | vector_dims('[1,2,3]') |
+-----------------+------------------------+
| 3 | 3 |
| 3 | 3 |
+-----------------+------------------------+
2 rows in set