OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

Product Overview
DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

Product Overview
DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Database

SQL - V4.4.2

    Download PDF

    OceanBase logo

    The Unified Distributed Database for the AI Era.

    Follow Us
    Products
    OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
    Resources
    DocsBlogLive DemosTraining & CertificationTicket
    Company
    About OceanBaseTrust CenterLegalPartnerContact Us
    Follow Us

    © OceanBase 2026. All rights reserved

    Cloud Service AgreementPrivacy PolicySecurity
    Contact Us
    Document Feedback
    1. Documentation Center
    2. OceanBase Database
    3. SQL
    4. V4.4.2
    iconOceanBase Database
    SQL - V 4.4.2
    Databases
    • OceanBase Database
    • OceanBase Cloud
    • OceanBase Tugraph
    • Interactive Tutorials
    • OceanBase Best Practices
    Tools
    • OceanBase Cloud Platform
    • OceanBase Migration Service
    • OceanBase Developer Center
    • OceanBase Migration Assessment
    • OceanBase Admin Tool
    • OceanBase Loader and Dumper
    • OceanBase Deployer
    • Kubernetes operator for OceanBase
    • OceanBase Diagnostic Tool
    • OceanBase Binlog Service
    Connectors and Middleware
    • OceanBase Database Proxy
    • Embedded SQL in C for OceanBase
    • OceanBase Call Interface
    • OceanBase Connector/C
    • OceanBase Connector/J
    • OceanBase Connector/ODBC
    • OceanBase Connector/NET
    SQL
    KV
    • V 4.6.0
    • V 4.4.2
    • V 4.3.5
    • V 4.3.3
    • V 4.3.1
    • V 4.3.0
    • V 4.2.5
    • V 4.2.2
    • V 4.2.1
    • V 4.2.0
    • V 4.1.0
    • V 4.0.0
    • V 3.1.4 and earlier

    SQL functions

    Last Updated:2026-05-12 02:23:53  Updated
    Share
    What is on this page
    Considerations
    Configure vector index memory
    Distance functions
    L2_distance
    L2_squared
    L1_distance
    Cosine_distance
    Inner_product
    Negative_inner_product
    Vector_distance
    Similarity functions
    inner_product_similarity
    cosine_similarity
    l2_similarity
    Arithmetic functions
    Comparison functions
    Aggregate functions
    Sum
    Avg
    Other common vector functions
    Vector_norm
    Vector_dims

    folded

    Share

    This topic describes the vector functions supported by OceanBase and how to use them.

    Considerations

    • Vectors with different dimensions cannot be used in the operations below; an error different vector dimensions %d and %d is returned.
    • If the result is outside the representable range of floating-point values, an error value out of range: overflow / underflow is returned.
    • Dense vector indexes support L2, inner product, and cosine distance as index distance algorithms. See HNSW series index and IVF series index.
    • For vector index search you can use the distance functions L2_distance, Cosine_distance, Inner_product, and Negative_inner_product described in this topic, as well as the similarity functions inner_product_similarity, cosine_similarity, and l2_similarity.

    Notice

    Support for similarity functions was introduced in V4.4.2 BP1.

    Configure vector index memory

    OceanBase vector search uses ob_vector_memory_limit_percentage to control vector index memory:

    • Before V4.4.1, you must set ob_vector_memory_limit_percentage before using HNSW, HNSW_SQ, or HNSW_BQ indexes. We recommend 30 for best search performance. If you leave the default, no memory is allocated for vector indexes and index creation fails. IVF and IVF_PQ indexes do not require resident memory, so you can ignore this parameter for them. Example:

      ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;
      
    • From V4.4.1, vector search is on by default. The default 0 means adaptive mode: the system sets the memory ratio for vector index data in the tenant automatically.

      • If tenant memory is 8 GB or less, the effective value is 40.
      • If tenant memory is greater than 8 GB, the effective value is 50.

    Distance functions

    Distance functions calculate the distance between two vectors, and the calculation method varies depending on the distance algorithm.

    L2_distance

    The Euclidean distance reflects the distance between the coordinates of the compared vectors, which is essentially the straight-line distance between the two vectors. It is calculated by applying the Pythagorean theorem to the vector coordinates:

    Pythagorean theorem

    The function syntax is as follows:

    l2_distance(vector v1, vector v2)
    

    The parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a parameter is of a single-level array type, the elements of the parameter must not contain NULL.

    The return value is described as follows:

    • The return value is a distance(double) distance value.

    • If any parameter is NULL, NULL is returned.

    Here is an example:

    CREATE TABLE t1(c1 vector(3));
    INSERT INTO t1 VALUES('[1,2,3]');
    SELECT l2_distance(c1, [1,2,3]), l2_distance([1,2,3],[1,1,1]), l2_distance('[1,1,1]','[1,2,3]') FROM t1;
    

    The result is as follows:

    +--------------------------+------------------------------+----------------------------------+
    | l2_distance(c1, [1,2,3]) | l2_distance([1,2,3],[1,1,1]) | l2_distance('[1,1,1]','[1,2,3]') |
    +--------------------------+------------------------------+----------------------------------+
    |                        0 |             2.23606797749979 |                 2.23606797749979 |
    +--------------------------+------------------------------+----------------------------------+
    1 row in set
    

    L2_squared

    The L2 Squared distance is the square of the Euclidean distance (L2 Distance). It omits the square root operation in the Euclidean distance formula, thereby reducing the computational cost while maintaining the relative order of distances. The calculation method is as follows:

    Square of L2 distance

    The syntax is as follows:

    l2_squared(vector v1, vector v2)
    

    The parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a parameter is of a single-level array type, the elements of the parameter must not contain NULL.

    The return value is described as follows:

    • The return value is a distance(double) distance value.

    • If any parameter is NULL, NULL is returned.

    Here is an example:

    CREATE TABLE t1(c1 vector(3));
    INSERT INTO t1 VALUES('[1,2,3]');
    SELECT l2_squared(c1, [1,2,3]), l2_squared([1,2,3],[1,1,1]), l2_squared('[1,1,1]','[1,2,3]') FROM t1;
    

    The result is as follows:

    +-------------------------+-----------------------------+---------------------------------+
    | l2_squared(c1, [1,2,3]) | l2_squared([1,2,3],[1,1,1]) | l2_squared('[1,1,1]','[1,2,3]') |
    +-------------------------+-----------------------------+---------------------------------+
    |                       0 |                           5 |                               5 |
    +-------------------------+-----------------------------+---------------------------------+
    1 row in set
    

    L1_distance

    The Manhattan distance is used to calculate the sum of the absolute differences along the standard coordinate axes for two points. The calculation formula is as follows:

    Manhattan distance

    The function syntax is as follows:

    l1_distance(vector v1, vector v2)
    

    The parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a parameter is of a single-level array type, the elements of the parameter must not contain NULL.

    The return value is described as follows:

    • The return value is a distance(double) distance value.

    • If any parameter is NULL, NULL is returned.

    Here is an example:

    CREATE TABLE t2(c1 vector(3));
    INSERT INTO t2 VALUES('[1,2,3]');
    INSERT INTO t2 VALUES('[1,1,1]');
    SELECT l1_distance(c1, [1,2,3]) FROM t2;
    

    The result is as follows:

    +--------------------------+
    | l1_distance(c1, [1,2,3]) |
    +--------------------------+
    |                        0 |
    |                        3 |
    +--------------------------+
    2 rows in set
    

    Cosine_distance

    Cosine similarity measures the angular difference between two vectors, reflecting their directional similarity regardless of their magnitudes. The cosine similarity ranges from [-1, 1], where 1 indicates the vectors are in the same direction, 0 indicates they are orthogonal, and -1 indicates they are in opposite directions.

    The cosine similarity is calculated as follows:

    Cosine similarity

    Since a cosine similarity closer to 1 indicates greater similarity, cosine distance (or cosine dissimilarity) is sometimes used as a measure of distance between vectors. The cosine distance can be calculated by subtracting the cosine similarity from 1:

    Cosine distance

    The cosine distance ranges from [0, 2], where 0 indicates the vectors are in the same direction (no distance), and 2 indicates they are in opposite directions.

    The function syntax is as follows:

    cosine_distance(vector v1, vector v2)
    

    The parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a parameter is of a single-level array type, the elements of the parameter must not contain NULL.

    The return value is described as follows:

    • The return value is a distance(double) distance value.

    • If any parameter is NULL, NULL is returned.

    Here is an example:

    CREATE TABLE t3(c1 vector(3));
    INSERT INTO t3 VALUES('[1,2,3]');
    INSERT INTO t3 VALUES('[1,2,1]');
    SELECT cosine_distance(c1, [1,2,3]) FROM t3;
    
    +------------------------------+
    | cosine_distance(c1, [1,2,3]) |
    +------------------------------+
    |                            0 |
    |          0.12712843905603044 |
    +------------------------------+
    2 rows in set
    

    Inner_product

    The inner product, also known as the dot product or scalar product, represents a product between two vectors. In a geometric sense, the inner product indicates the directional and magnitude relationship between two vectors. The calculation method is as follows:

    Inner product

    The syntax is as follows:

    inner_product(vector v1, vector v2)
    

    The parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a parameter is of a single-level array type, the elements of the parameter must not contain NULL.

    • When using this function with sparse vectors, one parameter can be a sparse vector formatted as a string, such as c2,'{1:2.4}'; both parameters cannot be strings.

    The return value is described as follows:

    • The return value is a distance(double) distance value.

    • If any parameter is NULL, NULL is returned.

    Here is an example for dense vectors:

    CREATE TABLE t4(c1 vector(3));
    INSERT INTO t4 VALUES('[1,2,3]');
    INSERT INTO t4 VALUES('[1,2,1]');
    SELECT inner_product(c1, [1,2,3]) FROM t4;
    

    The result is as follows:

    +----------------------------+
    | inner_product(c1, [1,2,3]) |
    +----------------------------+
    |                         14 |
    |                          8 |
    +----------------------------+
    2 rows in set
    

    Here is an example for sparse vectors:

    CREATE TABLE t4(c1 INT, c2 SPARSEVECTOR, c3 SPARSEVECTOR);
    INSERT INTO t4 VALUES(1, '{1:1.1, 2:2.2}', '{1:2.4}');
    INSERT INTO t4 VALUES(2, '{1:1.5, 3:3.6}', '{4:4.5}'));
    SELECT inner_product(c2,c3) FROM t4;
    

    The result is as follows:

    +----------------------+
    | inner_product(c2,c3) |
    +----------------------+
    |    2.640000104904175 |
    |                    0 |
    +----------------------+
    2 rows in set
    

    Negative_inner_product

    Negative_inner_product calculates the negative inner product of two vectors. The formula is as follows:

    Negative inner product

    The syntax is as follows:

    negative_inner_product(vector v1, vector v2)
    

    The parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').
    • The dimensions of the two parameters must be the same.
    • When a parameter is of the single-level array type, the elements of the parameter must not contain NULL.
    • When using this function with sparse vectors, one parameter can be a string in sparse vector format (e.g., c2,'{1:2.4}'); both parameters cannot be strings.

    The return value is described as follows:

    • The return value is a distance(double) distance value.
    • If any parameter is NULL, NULL is returned.

    Dense vector example:

    CREATE TABLE t5(c1 vector(3));
    INSERT INTO t5 VALUES('[1,2,3]');
    INSERT INTO t5 VALUES('[1,2,1]');
    SELECT negative_inner_product(c1, [1,2,3]) FROM t5;
    

    The result is as follows:

    +-------------------------------------+
    | negative_inner_product(c1, [1,2,3]) |
    +-------------------------------------+
    |                                 -14 |
    |                                  -8 |
    |                                 -14 |
    |                                  -8 |
    +-------------------------------------+
    4 rows in set
    

    Here is an example for sparse vectors:

    CREATE TABLE t5(c1 INT, c2 SPARSEVECTOR, c3 SPARSEVECTOR);
    INSERT INTO t5 VALUES(1, '{1:1.1, 2:2.2}', '{1:2.4}');
    INSERT INTO t5 VALUES(2, '{1:1.5, 3:3.6}', '{4:4.5}'));
    SELECT negative_inner_product(c2,c3) FROM t5;
    

    The result is as follows:

    +-------------------------------+
    | negative_inner_product(c2,c3) |
    +-------------------------------+
    |            -2.640000104904175 |
    |                             0 |
    +-------------------------------+
    2 rows in set
    

    Vector_distance

    vector_distance calculates the distance between two vectors. You can choose the distance algorithm via the optional third parameter.

    Syntax:

    vector_distance(vector v1, vector v2 [, string metric])
    

    The vector v1/v2 parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a parameter is of the single-level array type, the elements of the parameter must not contain NULL.

    The metric parameter specifies the distance algorithm:

    • If omitted, the default is euclidean.

    • If specified, it must be one of:

      • euclidean — same as L2_distance.

      • manhattan — same as L1_distance.

      • cosine — same as Cosine_distance.

      • dot — same as Inner_product.

    The return value is described as follows:

    • The return value is a distance(double) distance value.

    • If any parameter is NULL, NULL is returned.

    Example:

    CREATE TABLE t5(c1 vector(3));
    INSERT INTO t5 VALUES('[1,2,3]');
    INSERT INTO t5 VALUES('[1,2,1]');
    SELECT vector_distance(c1, [1,2,3], euclidean) FROM t5;
    

    The result is as follows:

    +-----------------------------------------+
    | vector_distance(c1, [1,2,3], euclidean) |
    +-----------------------------------------+
    |                                       0 |
    |                                       2 |
    +-----------------------------------------+
    2 rows in set
    

    Similarity functions

    Similarity functions measure the similarity between two vectors. A higher value indicates a higher similarity. It is the inverse of the distance function: higher similarity means a smaller distance. In vector search, you can set a similarity threshold to filter results.

    inner_product_similarity

    inner_product_similarity calculates the inner product similarity between two vectors. The return value is calculated in the same way as inner_product, but it represents similarity instead of distance. The conversion between inner_product_similarity and inner_product is as follows: inner_product_similarity(v1, v2) = (1 + inner_product(v1, v2) / (vector_norm(v1) * vector_norm(v2))) / 2. The vector_norm function calculates the Euclidean norm (magnitude) of a vector, which represents the Euclidean distance from the vector to the origin. For a vector v = [v1, v2, v3, ..., vn], the norm is calculated as: vector_norm(v) = √(v1² + v2² + v3² + ... + vn²).

    Syntax:

    inner_product_similarity(vector v1, vector v2)
    

    Parameter description:

    • In addition to vector types, parameters can accept any type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a single-level array type parameter is present, the parameter elements cannot contain NULL.

    Return value description:

    • The return value is a similarity(double) similarity value. A higher value indicates a higher similarity.

    • If any parameter is NULL, it returns NULL.

    Example:

    CREATE TABLE t12(c1 vector(3));
    INSERT INTO t12 VALUES('[1,2,3]');
    INSERT INTO t12 VALUES('[1,2,1]');
    SELECT inner_product_similarity(c1, [1,2,3]) FROM t12;
    

    Return result:

    +---------------------------------------+
    | inner_product_similarity(c1, [1,2,3]) |
    +---------------------------------------+
    |                    0.9999999552965164 |
    |                    0.9364357516169548 |
    +---------------------------------------+
    2 rows in set
    

    Or more simply:

    SELECT inner_product_similarity([1,0,0], [0,1,0]);
    

    Return result:

    
    
    ```shell
    +--------------------------------------------+
    | inner_product_similarity([1,0,0], [0,1,0]) |
    +--------------------------------------------+
    |                                        0.5 |
    +--------------------------------------------+
    1 row in set
    

    cosine_similarity

    cosine_similarity calculates the cosine similarity between two vectors, reflecting their directional similarity, independent of their magnitudes. The calculation is as follows: cosine_similarity(v1, v2) = 1 - (cosine_distance(v1, v2) / 2).

    Syntax:

    cosine_similarity(vector v1, vector v2)
    

    Parameter description:

    • In addition to vector types, parameters can accept any type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a single-level array type parameter is present, the parameter elements cannot contain NULL.

    Return value description:

    • The return value is a similarity(double) similarity value, ranging from [-1, 1]. 1 indicates the vectors are in the same direction, 0 indicates they are orthogonal, and -1 indicates they are in opposite directions. A higher value indicates a higher similarity.

    • If any parameter is NULL, it returns NULL.

    Example:

    CREATE TABLE t13(c1 vector(3));
    INSERT INTO t13 VALUES('[1,2,3]');
    INSERT INTO t13 VALUES('[1,2,1]');
    SELECT cosine_similarity(c1, [1,2,3]) FROM t13;
    

    Return result:

    +--------------------------------+
    | cosine_similarity(c1, [1,2,3]) |
    +--------------------------------+
    |             0.9999999552965164 |
    |             0.9364357516169548 |
    +--------------------------------+
    2 rows in set
    

    Or more simply:

    SELECT cosine_similarity([1,0,0], [0,1,0]);
    

    Return result:

    +-------------------------------------+
    | cosine_similarity([1,0,0], [0,1,0]) |
    +-------------------------------------+
    |                                 0.5 |
    +-------------------------------------+
    1 row in set
    

    l2_similarity

    l2_similarity calculates the L2 similarity between two vectors. L2 similarity is based on the Euclidean distance, but it represents similarity instead of distance. A higher similarity value indicates a higher similarity between the vectors. The conversion between l2_similarity and l2_squared is as follows: l2_similarity(v1, v2) = 1 / (1 + l2_squared(v1, v2)).

    Syntax:

    l2_similarity(vector v1, vector v2)
    

    Parameter description:

    • In addition to vector types, parameters can accept any type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

    • The dimensions of the two parameters must be the same.

    • When a single-level array type parameter is present, the parameter elements cannot contain NULL.

    Return value description:

    • The return value is a similarity(double) similarity value. A higher value indicates a higher similarity.

    • If any parameter is NULL, it returns NULL.

    Example:

    CREATE TABLE t14(c1 vector(3));
    INSERT INTO t14 VALUES('[1,2,3]');
    INSERT INTO t14 VALUES('[1,2,1]');
    SELECT l2_similarity(c1, [1,2,3]) FROM t14;
    

    Return result:

    +----------------------------+
    | l2_similarity(c1, [1,2,3]) |
    +----------------------------+
    |                          1 |
    |                        0.2 |
    +----------------------------+
    2 rows in set
    

    Or more simply:

    SELECT l2_similarity([1,0,0], [0,1,0]);
    

    Return result:

    +---------------------------------+
    | l2_similarity([1,0,0], [0,1,0]) |
    +---------------------------------+
    |              0.3333333333333333 |
    +---------------------------------+
    1 row in set
    

    Arithmetic functions

    Arithmetic functions provide vector types with vector types, single-level array types, and special string types, as well as single-level array types with single-level array types and special string types for addition (+), subtraction (-), and multiplication (*) operations. The operations are performed element-wise, such as addition:

    Addition

    The syntax is as follows:

    v1 + v2
    v1 - v2
    v1 * v2
    

    The parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]'). Note: Both parameters cannot be string types; at least one must be a vector or single-level array type.

    • The dimensions of the two parameters must be the same.

    • When a single-level array type parameter is present, the elements of this parameter cannot contain NULL.

    The return value is described as follows:

    • If at least one parameter is a vector type, the return value is a vector type that matches the vector parameter.

    • If both parameters are single-level array types, the return value is of the array(float) type.

    • If any parameter is NULL, the return value is NULL.

    Here is an example:

    CREATE TABLE t6(c1 vector(3));
    INSERT INTO t6 VALUES('[1,2,3]');
    SELECT [1,2,3] + '[1.12,1000.0001, -1.2222]', c1 - [1,2,3] FROM t6;
    

    The result is as follows:

    +---------------------------------------+--------------+
    | [1,2,3] + '[1.12,1000.0001, -1.2222]' | c1 - [1,2,3] |
    +---------------------------------------+--------------+
    | [2.12,1002,1.7778]                    | [0,0,0]      |
    +---------------------------------------+--------------+
    1 row in set
    

    Comparison functions

    Comparison functions provide comparison calculations between vector types, single-level array types, and special string types, using the comparison operators =, !=, >, <, >=, and <=. The calculations are performed by comparing elements in lexicographical order.

    The syntax is as follows:

    v1 = v2
    v1 != v2
    v1 > v2
    v1 < v2
    v1 >= v2
    v1 <= v2
    

    The parameters are described as follows:

    • In addition to the vector type, parameters can accept any other type that can be strongly cast to a vector type, such as string types (e.g., '[1,2,3]').

      Notice

      At least one of the parameters must be a vector type.

    • The dimensions of the two parameters must be the same.

    • When a single-level array type parameter is present, its elements must not contain NULL.

    The return value is described as follows:

    • The return value is of the bool type.

    • If any parameter is NULL, the return value is NULL.

    Here is an example:

    CREATE TABLE t7(c1 vector(3));
    INSERT INTO t7 VALUES('[1,2,3]');
    SELECT c1 = '[1,2,3]' FROM t7;
    

    The result is as follows:

    +----------------+
    | c1 = '[1,2,3]' |
    +----------------+
    |              1 |
    +----------------+
    1 row in set
    

    Aggregate functions

    Notice

    Vector columns cannot be used as GROUP BY conditions, and DISTINCT is not supported.

    Sum

    The SUM function calculates the sum of a vector column in a table by adding the elements of the vector column one by one.

    Syntax:

    sum(vector v1)
    

    Parameters:

    • Only vector data types are supported.

    Return value description:

    • The SUM(vector) value is returned.

    Example:

    CREATE TABLE t8(c1 vector(3));
    INSERT INTO t8 VALUES('[1,2,3]'),('[2,3,4]'),('[3,4,5]');
    SELECT sum(c1) FROM t8;
    

    The result is as follows:

    +---------+
    | sum(c1) |
    +---------+
    | [6,9,12] |
    +---------+
    1 row in set
    

    The sum of multiple rows of vectors is calculated by adding the elements of the vectors one by one. For example, the first dimension is 1+2+3=6, the second dimension is 2+3+4=9, and the third dimension is 3+4+5=12. Therefore, the result is [6,9,12].

    Avg

    The AVG function calculates the average of a vector column in a table.

    Syntax:

    avg(vector v1)
    

    Parameters:

    • Only vector data types are supported.

    Return value description:

    • The AVG(vector) value is returned.

    • Rows with NULL values in the vector column are not counted.

    • If the input parameter is empty, NULL is returned.

    Example:

    CREATE TABLE t9(c1 vector(3));
    INSERT INTO t9 VALUES('[2,4,6]'),('[4,6,8]'),('[6,8,10]');
    SELECT avg(c1) FROM t9;
    

    The result is as follows:

    +---------+
    | avg(c1) |
    +---------+
    | [4,6,8] |
    +---------+
    1 row in set
    

    The average of multiple rows of vectors is calculated by averaging the elements of the vectors. For example, the first dimension is (2+4+6)/3=4, the second dimension is (4+6+8)/3=6, and the third dimension is (6+8+10)/3=8. Therefore, the result is [4,6,8].

    Other common vector functions

    Vector_norm

    The vector_norm function calculates the Euclidean norm (magnitude) of a vector, which represents the Euclidean distance between the vector and the origin. The calculation is as follows:

    Euclidean norm

    The syntax is:

    vector_norm(vector v1)
    

    Parameter description:

    • In addition to the vector type, the parameter can accept any other type that can be strongly cast to a vector, such as string type (e.g., '[1,2,3]').

    • When the parameter is a one-dimensional array, the elements cannot contain NULL.

    Return value description:

    • Returns the norm(double) value.

    • If the parameter is NULL, it returns NULL.

    Here is an example:

    CREATE TABLE t10(c1 vector(3));
    INSERT INTO t10 VALUES('[1,2,3]');
    SELECT vector_norm(c1),vector_norm([1,2,3]) FROM t10;
    

    The result is as follows:

    +--------------------+----------------------+
    | vector_norm(c1)    | vector_norm([1,2,3]) |
    +--------------------+----------------------+
    | 3.7416573867739413 |   3.7416573867739413 |
    +--------------------+----------------------+
    1 row in set
    

    Vector normalization

    Vector normalization (Normalization) refers to the process of standardizing the length (magnitude) of a vector to 1. The vector_norm function is commonly used to implement vector normalization.

    For vector indexes using the l2_distance distance type, it is recommended to normalize the vectors to improve the accuracy of similarity searches. After normalization, the magnitude of the vectors is 1, and the range of L2 distances is limited to [0, 2], thus avoiding precision issues caused by small similarity values.

    For example, for the vector [3, 4]:

    • The magnitude of the vector: vector_norm([3, 4]) = √(3² + 4²) = 5
    • Normalized vector: [3, 4] / 5 = [0.6, 0.8]

    The magnitude of the normalized vector is 1, as verified by:

    SELECT vector_norm([0.6, 0.8]);
    

    The result is as follows:

    +-------------------------+
    | vector_norm([0.6, 0.8]) |
    +-------------------------+
    |       1.000000029802322 |
    +-------------------------+
    1 row in set
    

    Vector_dims

    The vector_dims function returns the dimension of the vector.

    The syntax is:

    vector_dims(vector v1)
    

    Parameter description:

    • In addition to the vector type, the parameter can accept any other type that can be strongly cast to a vector, such as string type (e.g., '[1,2,3]').

    Return value description:

    • Returns the dims(int64) value.

    • If the parameter is NULL, an error is returned.

    Here is an example:

    CREATE TABLE t11(c1 vector(3));
    INSERT INTO t11 VALUES('[1,2,3]');
    INSERT INTO t11 VALUES('[1,1,1]');
    SELECT vector_dims(c1), vector_dims('[1,2,3]') FROM t11;
    

    The result is as follows:

    +-----------------+------------------------+
    | vector_dims(c1) | vector_dims('[1,2,3]') |
    +-----------------+------------------------+
    |               3 |                      3 |
    |               3 |                      3 |
    +-----------------+------------------------+
    2 rows in set
    

    Previous topic

    Optimize the performance of vector search
    Last

    Next topic

    Vector similarity search
    Next
    What is on this page
    Considerations
    Configure vector index memory
    Distance functions
    L2_distance
    L2_squared
    L1_distance
    Cosine_distance
    Inner_product
    Negative_inner_product
    Vector_distance
    Similarity functions
    inner_product_similarity
    cosine_similarity
    l2_similarity
    Arithmetic functions
    Comparison functions
    Aggregate functions
    Sum
    Avg
    Other common vector functions
    Vector_norm
    Vector_dims