Use SQL functions|V4.3.5| docs|Distributed Database

Use SQL functions

Last Updated：2025-12-09 06:29:15 Updated

This topic describes the vector functions supported by OceanBase Database and provides usage notes.

Considerations

Vectors with different dimensions are not supported. An error different vector dimensions %d and %d is returned when you perform an operation on vectors with different dimensions.
An error value out of range: overflow / underflow is returned when the result overflows or underflows the floating-point number range.
The L2_distance, Cosine_distance, Inner_product, and Negative_inner_product functions can be used for vector index search. The inner_product_similarity, cosine_similarity, and l2_similarity functions can also be used for vector index search.

Notice

Support for similarity functions is available starting from V4.3.5 BP5.

Configure the memory for vector indexes

OceanBase Database allows you to configure the memory for vector indexes by setting the ob_vector_memory_limit_percentage parameter:

Before V4.3.5 BP3, you must manually set the ob_vector_memory_limit_percentage parameter to enable vector search. We recommend that you set this parameter to 30 for optimal query performance. If you do not set this parameter, no memory is allocated for vector indexes, and an error is returned when you create an index. Here is an example:
```
ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;
```
Starting from V4.3.5 BP3, vector search is enabled by default. The default value of 0 indicates adaptive mode, where the system automatically adjusts the memory usage ratio of vector index data in a tenant without manual intervention:
- If the actual memory of a tenant is 8 GB or less, the value is automatically adjusted to 40.
- If the actual memory of a tenant is more than 8 GB, the value is automatically adjusted to 50.

Distance function

A distance function is used to calculate the distance between two vectors. The calculation method varies depending on the distance algorithm.

L2_distance

The Euclidean distance measures the distance between the coordinates of the compared vectors, which is essentially the straight-line distance between the two vectors. It is calculated using the Pythagorean theorem applied to the vector coordinates:

Pythagorean theorem

The function syntax is as follows:

l2_distance(vector v1, vector v2)

The parameter descriptions are as follows:

In addition to vector types, parameters can accept any other type that can be strongly cast to a vector type, such as string types ('[1,2,3]').
The dimensions of both parameters must be the same.
If a parameter is a single-level array type, its elements must not contain NULL.

The return value descriptions are as follows:

The return value is a distance(double) value.
If any parameter is NULL, NULL is returned.

Example:

CREATE TABLE t1(c1 vector(3));
INSERT INTO t1 VALUES('[1,2,3]');
SELECT l2_distance(c1, [1,2,3]), l2_distance([1,2,3],[1,1,1]), l2_distance('[1,1,1]','[1,2,3]') FROM t1;

The result is as follows:

+--------------------------+------------------------------+----------------------------------+
| l2_distance(c1, [1,2,3]) | l2_distance([1,2,3],[1,1,1]) | l2_distance('[1,1,1]','[1,2,3]') |
+--------------------------+------------------------------+----------------------------------+
|                        0 |             2.23606797749979 |                 2.23606797749979 |
+--------------------------+------------------------------+----------------------------------+
1 row in set

L1_distance

The Manhattan distance calculates the sum of the absolute differences along the standard coordinate axes between two points. The calculation formula is:

Manhattan distance

The function syntax is as follows:

l1_distance(vector v1, vector v2)

The parameter descriptions are as follows:

In addition to vector types, parameters can accept any other type that can be explicitly converted to a vector type, including single-level array types (e.g., [1,2,3,..]) and string types (e.g., '[1,2,3]').
The dimensions of the two parameters must be the same.
When a single-level array type parameter is present, the elements of the parameter cannot contain NULL.

The return value descriptions are as follows:

The return value is a distance(double) value.
If any parameter is NULL, NULL is returned.

Example:

CREATE TABLE t2(c1 vector(3));
INSERT INTO t2 VALUES('[1,2,3]');
INSERT INTO t2 VALUES('[1,1,1]');
SELECT l1_distance(c1, [1,2,3]) FROM t2;

The result is as follows:

+--------------------------+
| l1_distance(c1, [1,2,3]) |
+--------------------------+
|                        0 |
|                        3 |
+--------------------------+
2 rows in set

Cosine_distance

Cosine similarity measures the angular difference between two vectors, reflecting their directional similarity, independent of their magnitudes. The cosine similarity range is [-1, 1], where 1 indicates identical directions, 0 indicates orthogonality, and -1 indicates opposite directions.

The calculation method for cosine similarity is:

Cosine similarity

Since cosine similarity closer to 1 indicates greater similarity, cosine distance (or cosine dissimilarity) is sometimes used as a measure of distance between vectors. Cosine distance can be calculated by subtracting the cosine similarity from 1:

Cosine distance

The cosine distance range is [0, 2], where 0 indicates identical directions (no distance), and 2 indicates completely opposite directions.

The function syntax is as follows:

cosine_distance(vector v1, vector v2)

The parameter descriptions are as follows:

In addition to vector types, parameters can accept any other type that can be explicitly converted to a vector type, including single-level array types (e.g., [1,2,3,..]) and string types (e.g., '[1,2,3]').
The dimensions of the two parameters must be the same.
When a single-level array type parameter is present, the elements of the parameter cannot contain NULL.

The return value descriptions are as follows:

The return value is a distance(double) value.
If any parameter is NULL, NULL is returned.

Example:

CREATE TABLE t3(c1 vector(3));
INSERT INTO t3 VALUES('[1,2,3]');
INSERT INTO t3 VALUES('[1,2,1]');
SELECT cosine_distance(c1, [1,2,3]) FROM t3;

+------------------------------+
| cosine_distance(c1, [1,2,3]) |
+------------------------------+
|                            0 |
|          0.12712843905603044 |
+------------------------------+
2 rows in set

Inner_product

The inner product, also known as the dot product or scalar product, represents a type of multiplication between two vectors. Geometrically, the inner product reflects the directional and magnitude relationships between the two vectors. The calculation method for the inner product is:

Inner product

The syntax is as follows:

inner_product(vector v1, vector v2)

The parameter descriptions are as follows:

In addition to vector types, parameters can accept any other type that can be explicitly converted to a vector type, including single-level array types (e.g., [1,2,3,..]) and string types (e.g., '[1,2,3]').
The dimensions of the two parameters must be the same.
When a single-level array type parameter is present, the elements of the parameter cannot contain NULL.
For sparse vectors, one parameter can be a string in the sparse vector format, such as c2,'{1:2.4}'. However, if one parameter is a string, the other parameter must not be a string.

The return value descriptions are as follows:

The return value is a distance(double) value.
If any parameter is NULL, NULL is returned.

Example for dense vectors:

CREATE TABLE t4(c1 vector(3));
INSERT INTO t4 VALUES('[1,2,3]');
INSERT INTO t4 VALUES('[1,2,1]');
SELECT inner_product(c1, [1,2,3]) FROM t4;

The result is as follows:

+----------------------------+
| inner_product(c1, [1,2,3]) |
+----------------------------+
|                         14 |
|                          8 |
+----------------------------+
2 rows in set

Example for sparse vectors:

CREATE TABLE t4(c1 INT, c2 SPARSEVECTOR, c3 SPARSEVECTOR);
INSERT INTO t4 VALUES(1, '{1:1.1, 2:2.2}', '{1:2.4}');
INSERT INTO t4 VALUES(2, '{1:1.5, 3:3.6}', '{4:4.5}'));
SELECT inner_product(c2,c3) FROM t4;

The result is as follows:

+----------------------+
| inner_product(c2,c3) |
+----------------------+
|    2.640000104904175 |
|                    0 |
+----------------------+
2 rows in set

Negative_inner_product

Negative_inner_product calculates the negative inner product between two vectors, using the following method:

Negative inner product

The syntax is as follows:

negative_inner_product(vector v1, vector v2)

The parameter descriptions are as follows:

In addition to vector types, parameters can accept any other type that can be explicitly converted to a vector type, including single-level array types (e.g., [1,2,3,..]) and string types (e.g., '[1,2,3]').
The dimensions of the two parameters must be the same.
When a single-level array type parameter is present, the elements of the parameter cannot contain NULL.
For sparse vectors, this function supports one parameter being a sparse vector formatted as a string, such as c2,'{1:2.4}'; it does not support both parameters being strings.

The return value descriptions are as follows:

The return value is a distance(double) value.
If any parameter is NULL, NULL is returned.

Example for dense vectors:

CREATE TABLE t5(c1 vector(3));
INSERT INTO t5 VALUES('[1,2,3]');
INSERT INTO t5 VALUES('[1,2,1]');
SELECT negative_inner_product(c1, [1,2,3]) FROM t5;

The result is as follows:

+-------------------------------------+
| negative_inner_product(c1, [1,2,3]) |
+-------------------------------------+
|                                 -14 |
|                                  -8 |
|                                 -14 |
|                                  -8 |
+-------------------------------------+
4 rows in set

Example for sparse vectors:

CREATE TABLE t5(c1 INT, c2 SPARSEVECTOR, c3 SPARSEVECTOR);
INSERT INTO t5 VALUES(1, '{1:1.1, 2:2.2}', '{1:2.4}');
INSERT INTO t5 VALUES(2, '{1:1.5, 3:3.6}', '{4:4.5}'));
SELECT negative_inner_product(c2,c3) FROM t5;

The result is as follows:

+-------------------------------+
| negative_inner_product(c2,c3) |
+-------------------------------+
|            -2.640000104904175 |
|                             0 |
+-------------------------------+
2 rows in set

Vector_distance

The vector_distance function calculates the distance between two vectors and allows you to choose different distance algorithms by specifying parameters.

The syntax is as follows:

vector_distance(vector v1, vector v2 [, string metric])

The vector v1/v2 parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted. Examples of other types include single-level array types such as [1,2,3,...] and string types such as '[1,2,3]'.
The dimensions of the two parameters must be the same.
If a single-level array type is involved, the elements of this parameter cannot be NULL.

The metric parameter specifies the distance algorithm, which is optional:

If not specified, the default algorithm is euclidean.
If specified, the only available value is
- euclidean: This represents the Euclidean distance, which is equivalent to L2_distance.
- manhattan: This represents the Manhattan distance, which is equivalent to L1_distance.
- cosine: This represents the cosine distance, which is equivalent to Cosine_distance.
- dot: This represents the inner product, which is equivalent to Inner_product.

The return value is described as follows:

The return value is a distance(double) value indicating the distance.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t5(c1 vector(3));
INSERT INTO t5 VALUES('[1,2,3]');
INSERT INTO t5 VALUES('[1,2,1]');
SELECT vector_distance(c1, [1,2,3], euclidean) FROM t5;

The result is as follows:

+-----------------------------------------+
| vector_distance(c1, [1,2,3], euclidean) |
+-----------------------------------------+
|                                       0 |
|                                       2 |
+-----------------------------------------+
2 rows in set

Similarity functions

Similarity functions measure the similarity between two vectors. A higher value indicates a greater similarity. They are the inverse of distance functions: higher similarity means a smaller distance. In vector search, you can set a similarity threshold to filter results that meet the criteria.

inner_product_similarity

inner_product_similarity calculates the inner product similarity between two vectors. The return value is calculated the same way as inner_product, but it represents similarity instead of distance. The conversion between inner_product_similarity and inner_product is as follows: inner_product_similarity(v1, v2) = (1 + inner_product(v1, v2) / (vector_norm(v1) * vector_norm(v2))) / 2. The vector_norm function calculates the Euclidean norm (magnitude) of a vector, which represents the Euclidean distance from the vector to the origin. For a vector v = [v1, v2, v3, ..., vn], the norm is calculated as: vector_norm(v) = √(v1² + v2² + v3² + ... + vn²).

Syntax:

inner_product_similarity(vector v1, vector v2)

Parameters:

In addition to vector types, parameters can accept any type that can be strongly cast to a vector type, including single-level array types (e.g., [1,2,3,..]) and string types (e.g., '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type is involved, the elements of this parameter cannot be NULL.

Return value:

The return value is a similarity(double) similarity value. A higher value indicates greater similarity.
If any parameter is NULL, the return value is NULL.

Example:

CREATE TABLE t12(c1 vector(3));
INSERT INTO t12 VALUES('[1,2,3]');
INSERT INTO t12 VALUES('[1,2,1]');
SELECT inner_product_similarity(c1, [1,2,3]) FROM t12;

Result:

+---------------------------------------+
| inner_product_similarity(c1, [1,2,3]) |
+---------------------------------------+
|                    0.9999999552965164 |
|                    0.9364357516169548 |
+---------------------------------------+
2 rows in set

Or more simply:

SELECT inner_product_similarity([1,0,0], [0,1,0]);

Result:

+--------------------------------------------+
| inner_product_similarity([1,0,0], [0,1,0]) |
+--------------------------------------------+
|                                        0.5 |
+--------------------------------------------+
1 row in set

cosine_similarity

cosine_similarity calculates the cosine similarity between two vectors, reflecting their directional similarity regardless of their magnitude. The calculation is: cosine_similarity(v1, v2) = 1 - (cosine_distance(v1, v2) / 2).

Syntax:

cosine_similarity(vector v1, vector v2)

Parameters:

In addition to vector types, parameters can accept any type that can be strongly cast to a vector type, including single-level array types (e.g., [1,2,3,..]) and string types (e.g., '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type is involved, the elements of this parameter cannot be NULL.

Return value:

The return value is a similarity(double) similarity value, ranging from [-1, 1]. 1 indicates the vectors are in the same direction, 0 indicates they are orthogonal, and -1 indicates they are in opposite directions. A higher value indicates greater similarity.
If any parameter is NULL, the return value is NULL.

Example:

CREATE TABLE t13(c1 vector(3));
INSERT INTO t13 VALUES('[1,2,3]');
INSERT INTO t13 VALUES('[1,2,1]');
SELECT cosine_similarity(c1, [1,2,3]) FROM t13;

Result:

+--------------------------------+
| cosine_similarity(c1, [1,2,3]) |
+--------------------------------+
|             0.9999999552965164 |
|             0.9364357516169548 |
+--------------------------------+
2 rows in set

Or more simply:

SELECT cosine_similarity([1,0,0], [0,1,0]);

Result:

+-------------------------------------+
| cosine_similarity([1,0,0], [0,1,0]) |
+-------------------------------------+
|                                 0.5 |
+-------------------------------------+
1 row in set

l2_similarity

l2_similarity calculates the L2 similarity between two vectors. L2 similarity is based on the Euclidean distance, but it represents similarity instead of distance. A higher similarity value indicates greater similarity between the vectors. The conversion between l2_similarity and l2_squared is as follows: l2_similarity(v1, v2) = 1 / (1 + l2_squared(v1, v2)).

Syntax:

l2_similarity(vector v1, vector v2)

Parameters:

In addition to vector types, parameters can accept any type that can be strongly cast to a vector type, including single-level array types (e.g., [1,2,3,..]) and string types (e.g., '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type is involved, the elements of this parameter cannot be NULL.

Return value:

The return value is a similarity(double) similarity value. A higher value indicates greater similarity.
If any parameter is NULL, the return value is NULL.

Example:

CREATE TABLE t14(c1 vector(3));
INSERT INTO t14 VALUES('[1,2,3]');
INSERT INTO t14 VALUES('[1,2,1]');
SELECT l2_similarity(c1, [1,2,3]) FROM t14;

Result:

+----------------------------+
| l2_similarity(c1, [1,2,3]) |
+----------------------------+
|                          1 |
|                        0.2 |
+----------------------------+
2 rows in set

Or more simply:

SELECT l2_similarity([1,0,0], [0,1,0]);

Result:

+---------------------------------+
| l2_similarity([1,0,0], [0,1,0]) |
+---------------------------------+
|              0.3333333333333333 |
+---------------------------------+
1 row in set

Arithmetic functions

Arithmetic functions provide element-wise addition (+), subtraction (-), and multiplication () operations between vector types, between single-level array types, and between special string types. The operations also support element-wise addition (+), subtraction (-), and multiplication () operations between single-level array types and special string types, as well as between two special string types. The operations are performed element-wise.

Addition

The syntax is as follows:

v1 + v2
v1 - v2
v1 * v2

Here is the parameter description:

In addition to vector types, the parameters can be any type that can be explicitly cast to a vector type, including single-dimensional array types (such as [1,2,3,..]) and special string types (such as '[1,2,3]'). Note: One of the parameters must be a vector or a single-dimensional array type, and both parameters cannot be string types.
The dimensions of the two parameters must be the same.
If a single-level array type is involved, the elements of this parameter cannot be NULL.

Here is the return value description:

If at least one of the parameters is a vector, the return value is a vector of the same type as the vector parameter.
If both parameters are of single-level array types, the return value is of the array(float) type.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t6(c1 vector(3));
INSERT INTO t6 VALUES('[1,2,3]');
SELECT [1,2,3] + '[1.12,1000.0001, -1.2222]', c1 - [1,2,3] FROM t6;

The return result is as follows:

+---------------------------------------+--------------+
| [1,2,3] + '[1.12,1000.0001, -1.2222]' | c1 - [1,2,3] |
+---------------------------------------+--------------+
| [2.12,1002,1.7778]                    | [0,0,0]      |
+---------------------------------------+--------------+
1 row in set

Comparison functions

Comparison functions support comparison operations between vector types, single-dimensional array types, and special string types. The supported comparison operators are =, !=, >, <, >=, and <=. These operations are performed element-wise based on lexicographical order.

The syntax is as follows:

v1 = v2
v1 != v2
v1 > v2
v1 < v2
v1 >= v2
v1 <= v2

Here is the parameter description:

In addition to vector types, the parameters can be any type that can be explicitly cast to a vector type, including single-dimensional array types (such as [1,2,3,..]) and special string types (such as '[1,2,3]').

Notice

At least one of the parameters must be a vector type.

The dimensions of the two parameters must be the same.
If a parameter is a single-dimensional array type, the elements of the array cannot contain NULL.

Here is the return value description:

The return value is of the bool type.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t7(c1 vector(3));
INSERT INTO t7 VALUES('[1,2,3]');
SELECT c1 = '[1,2,3]' FROM t7;

The return result is as follows:

+----------------+
| c1 = '[1,2,3]' |
+----------------+
|              1 |
+----------------+
1 row in set

Aggregate functions

Notice

You cannot use a vector column as the GROUP BY condition. The DISTINCT clause is not supported.

Sum

The sum function calculates the sum of vector columns in a table. The sum is calculated by adding the elements of the vectors element-wise.

The syntax is as follows:

sum(vector v1)

Here is the parameter description:

Only vector types are supported.

Here is the return value description:

The return value is the sum (vector) value.

Here is an example:

CREATE TABLE t8(c1 vector(3));
INSERT INTO t8 VALUES('[1,2,3]');
SELECT sum(c1) FROM t8;

The return result is as follows:

+---------+
| sum(c1) |
+---------+
| [1,2,3] |
+---------+
1 row in set

Avg

The avg function calculates the average of vector columns in a table.

The syntax is as follows:

avg(vector v1)

Here is the parameter description:

Only vector types are supported.

Here is the return value description:

The return value is the avg (vector) value.
The NULL values in the vector column are not counted.
If the input parameter is empty, the return value is NULL.

Here is an example:

CREATE TABLE t9(c1 vector(3));
INSERT INTO t9 VALUES('[1,2,3]');
SELECT avg(c1) FROM t9;

The return result is as follows:

+---------+
| avg(c1) |
+---------+
| [1,2,3] |
+---------+
1 row in set

Other common vector functions

Vector_norm

The vector_norm function calculates the Euclidean norm (magnitude) of a vector, which represents the Euclidean distance between the vector and the origin. The calculation method is as follows:

Euclidean norm

The syntax is as follows:

vector_norm(vector v1)

Here is the parameter description:

In addition to vector types, the parameters can be any type that can be explicitly cast to a vector type, including single-dimensional array types (such as [1,2,3,..]) and special string types (such as '[1,2,3]').
If a parameter is a single-dimensional array type, the elements of the array cannot contain NULL.

Here is the return value description:

The return value is the norm(double) value.
If the parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t10(c1 vector(3));
INSERT INTO t10 VALUES('[1,2,3]');
SELECT vector_norm(c1),vector_norm([1,2,3]) FROM t10;

The return result is as follows:

+--------------------+----------------------+
| vector_norm(c1)    | vector_norm([1,2,3]) |
+--------------------+----------------------+
| 3.7416573867739413 |   3.7416573867739413 |
+--------------------+----------------------+
1 row in set

Vector normalization

Vector normalization is the process of standardizing the length (magnitude) of a vector to 1. The vector_norm function is commonly used to implement vector normalization.

For vector indexes using the l2_distance distance type, it is recommended to normalize the vectors to improve the accuracy of similarity searches. After normalization, the magnitude of the vectors becomes 1, and the range of L2 distances is limited to [0, 2], preventing precision issues caused by small similarity values.

For example, for the vector [3, 4]:

The magnitude is: vector_norm([3, 4]) = √(3² + 4²) = 5
After normalization: [3, 4] / 5 = [0.6, 0.8]

The magnitude of the normalized vector is 1, which can be verified as follows:

SELECT vector_norm([0.6, 0.8]);

The result is:

+-------------------------+
| vector_norm([0.6, 0.8]) |
+-------------------------+
|       1.000000029802322 |
+-------------------------+
1 row in set

Vector_dims

The vector_dims function is used to return the dimension of a vector.

The syntax is as follows:

vector_dims(vector v1)

The parameters are described as follows:

In addition to vectors, the parameters can accept other types that can be strongly converted to vectors, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').

The return value is described as follows:

The dims(int64) value is returned.
An error is returned when the parameter is NULL.

Here is an example:

CREATE TABLE t11(c1 vector(3));
INSERT INTO t11 VALUES('[1,2,3]');
INSERT INTO t11 VALUES('[1,1,1]');
SELECT vector_dims(c1), vector_dims('[1,2,3]') FROM t11;

The result is as follows:

+-----------------+------------------------+
| vector_dims(c1) | vector_dims('[1,2,3]') |
+-----------------+------------------------+
|               3 |                      3 |
|               3 |                      3 |
+-----------------+------------------------+
2 rows in set