Purpose
This statement is used to query data from one or more tables.
This section describes the general SELECT syntax. For more information about other SELECT related syntax, see:
Privilege requirements
To execute the SELECT statement, the current user must have the SELECT privilege. For more information about the privileges of OceanBase Database, see Privilege types in MySQL-compatible mode.
Syntax
select_stmt:
SELECT [hint_options]
[{DISTINCT | UNIQUE} | ALL]
[SQL_CALC_FOUND_ROWS]
[STRAIGHT_JOIN]
select_expr_list
[FROM from_list]
[WHERE where_condition]
[GROUP BY group_by_condition_list [WITH ROLLUP]
| GROUP BY [group_by_condition_list] group_by_summary_option (expression_list)]
[HAVING having_condition]
[window_clause]
[ORDER BY order_by_condition_list]
[LIMIT limit_clause]
[lock_option];
select_expr_list:
select_expr [, select_expr ...]
select_expr:
*
| table_name.{* | column_name}
| table_alias_name.{* | column_name}
| expr [[AS] column_alias_name]
from_list:
DUAL
| table_references
| { location_url | table_function }
location_url:
'file_path'
(
{FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'charset',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
PARSE_HEADER = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
| FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
}
[PATTERN = '<regex_pattern>']
)
table_function:
{
FILES (
LOCATION = {'file_path' | @location_name['/path']},
{
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'charset',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
PARSE_HEADER = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
| FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
},
[PATTERN = '<regex_pattern>']
)
| SOURCE (
TYPE = 'ODPS',
ACCESSID = '<string>',
ACCESSKEY = '<string>',
ENDPOINT = '<string>',
TUNNEL_ENDPOINT = '<string>',
PROJECT_NAME = '<string>',
SCHEMA_NAME = '<string>',
TABLE_NAME = '<string>',
QUOTA_NAME = '<string>',
COMPRESSION_CODE = '<string>'
)
}
where_condition:
expression
group_by_condition_list:
group_by_condition [, group_by_condition ...]
group_by_condition:
expression [ASC | DESC]
group_by_summary_option:
GROUPING SETS
| ROLLUP
| CUBE
expression_list:
expression [, expression ...]
having_condition:
expression
order_by_condition_list:
order_by_condition [, order_by_condition ...]
order_by_condition:
expression [ASC | DESC]
| vector_distance_function(vector_column, query_vector) [ASC | DESC] APPROXIMATE
limit_clause:
[offset,] row_count
| row_count OFFSET offset
lock_option:
FOR UPDATE [opt_for_update_wait]
| LOCK IN SHARE MODE
opt_for_update_wait:
WAIT {decimal | intnum}
| NOWAIT
| NO_WAIT
| SKIP LOCKED
Parameters
| Parameter | Description |
|---|---|
| select_stmt | The SQL statement part used to specify which data to retrieve from the database. |
| hint_options | Optional. Specifies the hint option. For more information about hints, see Optimizer Hint. |
| {DISTINCT | UNIQUE} | ALL | Optional. Specifies whether to return duplicate rows in the result set.
|
| SQL_CACHE | SQL_NO_CACHE | Optional. Specifies whether to cache the query results.
|
| SQL_CALC_FOUND_ROWS | Optional. Specifies that the total number of rows returned in a query with the LIMIT clause is recorded and can be retrieved by using the FOUND_ROWS() function. |
| STRAIGHT_JOIN | Optional. Forces the optimizer to join tables in the order specified in the FROM clause. This command does not affect the execution result.
NoticeOceanBase Database is not fully compatible with the |
| select_expr_list | The list of columns or expressions to be displayed in the query result. For more information about columns and expressions, see select_expr. |
| column_name | The name of a column. |
| FROM from_list | Optional. Specifies the data source of the query. The data source can be a table, view, or subquery. For more information about the data source, see from_list. |
| { table_function | location_url } | Optional. The URL external table supports two syntax forms. For more information, see the following:
|
| WHERE where_condition | Optional. Specifies the filter condition for the query. For more information, see where_condition. |
| GROUP BY group_by_condition_list | Optional. Specifies the columns by which to group the result set. This clause is usually used with aggregate functions. For more information, see group_by_condition. |
| WITH ROLLUP | Optional. Specifies that the result set is aggregated at a higher level (also known as super-aggregation) and additional rows are generated. |
| GROUP BY [group_by_condition_list] group_by_summary_option (expression_list) | Optional. Specifies advanced grouping and aggregation operations.
|
| HAVING having_condition | Optional. Specifies the filter condition for the grouped result set. The HAVING clause is similar to the WHERE clause, but the HAVING clause can use aggregate functions such as SUM and AVG. For more information, see having_condition. |
| window_clause | Optional. Specifies the window definition for an analysis function (also known as a window function in some databases). For more information about the window_clause, see WINDOW clause. |
| ORDER BY order_by_condition_list | Optional. Specifies the columns by which to sort the result set. You can specify one or more columns for sorting. For more information, see order_by_condition. |
| LIMIT limit_clause | Optional. Specifies the maximum number of rows to return. For more information, see limit_clause. |
| lock_option | Optional. Specifies the lock option for the query result. For more information, see lock_option. |
| vector_distance_function | The vector distance function. Valid values: l2_distance (L2 distance) and cosine_distance (cosine distance). |
| vector_column | The name of a vector column. The column must be of the VECTOR type. |
| query_vector | The query vector, in the '[value1, value2, ...]' format. The dimension must match that of the vector column. |
| APPROXIMATE | A required keyword used to enable approximate search for vector indexes, improving query performance. |
| inner_product | An inner product function that represents the product of two vectors. |
select_expr
select_expr: specifies the columns or expressions to be included in the query result. Multiple expressions or column names are separated by commas (,). Valid values:
*: specifies to select all columns.Example:
Create tables
tbl1andtbl2.CREATE TABLE tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT) PARTITION BY HASH(col1) PARTITIONS 5;CREATE TABLE tbl2(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT);Insert test data into table
tbl1.INSERT INTO tbl1 VALUES(1, 'A1', 1),(2, 'A2', 2),(3, 'A3', 3);INSERT INTO tbl2 VALUES(1, 'A1', 1),(2, 'A2', 22),(3, 'A3', 33);View the data in table
tbl1.SELECT * FROM tbl1;The returned result is as follows:
+------+------+------+ | col1 | col2 | col3 | +------+------+------+ | 1 | A1 | 1 | | 2 | A2 | 2 | | 3 | A3 | 3 | +------+------+------+ 3 rows in set
table_name.{* | column_name}: specifies to select all columns or a specific column from a specified table.table_name.*: specifies to select all columns from a specified table.table_name.column_name: specifies to select a specific column from a specified table.Example:
Read the
col1data from tabletbl1.SELECT tbl1.col1 FROM tbl1;The returned result is as follows:
+------+ | col1 | +------+ | 1 | | 2 | | 3 | +------+ 3 rows in set
table_alias_name.{* | column_name}: similar to the previous format, but uses the table alias when selecting columns.expr [[AS] column_alias_name]: specifies to select an expression and optionally assign an alias to it as the column name.expr: specifies a column name, function, or computed expression.expr AS column_alias_name/expr column_alias_name: specifies to select the expression and name itcolumn_alias_name.
Example:
Read the
col1data from tabletbl1, compute a new columncol1_addwith the valuecol1+1, and compute another new columncol3_mulwith the valuecol3*2.SELECT col1, col1+1 AS col1_add, col3*2 col3_mul FROM tbl1;The returned result is as follows:
+------+----------+----------+ | col1 | col1_add | col3_mul | +------+----------+----------+ | 1 | 2 | 2 | | 2 | 3 | 4 | | 3 | 4 | 6 | +------+----------+----------+ 3 rows in set
from_list
DUAL: specifies the name of a virtual table. It is typically used to perform calculations or call functions without an actual table.Example:
SELECT 1+1, SYSDATE() FROM DUAL;The returned result is as follows:
+------+---------------------+ | 1+1 | SYSDATE() | +------+---------------------+ | 2 | 2024-08-28 15:20:59 | +------+---------------------+ 1 row in settable_references: specifies a list of table references, which can include multiple table references separated by commas (,). This indicates the list of data source tables to be queried, which can be a combination of one or more tables. For more information about the syntax oftable_references, see JOIN clause.
where_condition
expression: specifies the condition expression to filter the rows to be updated.
Example:
Select all rows from tbl1 that satisfy the conditions col1 > 1 and col2 = 'A3'.
SELECT * FROM tbl1
WHERE col1 > 1
AND col2 = 'A3';
The returned result is as follows:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 3 | A3 | 3 |
+------+------+------+
1 row in set
table_function
The LOCATION clause specifies the path where the external table files are stored. Typically, the data files of the external table are stored in a separate directory, which can contain subdirectories. When the external table is created, it automatically collects all files in the directory. The options are as follows:
file_path: specifies the path of the external table file. Details are as follows:For a local
LOCATION, the format isLOCATION = '[file://] local_file_path', wherelocal_file_pathcan be a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database. Thesecure_file_privparameter specifies the file paths that OBServer nodes are allowed to access.local_file_pathmust be a subpath of thesecure_file_privpath.For a remote
LOCATION, the format is:LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required for accessing Alibaba Cloud OSS, AWS S3, and object storage compatible with the S3 protocol.s3_regionspecifies the region selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.LOCATION = hdfs://localhost:port/PATH, wherelocalhostspecifies the address of HDFS,portspecifies the port number of HDFS, andPATHspecifies the directory path in HDFS.With Kerberos authentication:
LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Details are as follows:principal: specifies the login user.keytab: specifies the path of the key file for user authentication.krb5conf: specifies the path of the description file for the Kerberos environment.configs: specifies additional HDFS configurations. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value and needs to be configured. For example,dfs.data.transfer.protection=authentication,privacyspecifies the data transmission protection level asauthenticationandprivacy.
@location_name['/path']: specifies the location object for creating the external table.['/path']is an optional parameter that specifies a subdirectory. For more information about how to create a location object, see CREATE LOCATION.Note
For OceanBase Database V4.4.x, the
@location_name['/path']parameter is supported starting from V4.4.1.
The FORMAT clause specifies the file format-related properties. It supports the CSV, PARQUET, and ORC file formats.
- When TYPE = 'CSV', the following fields are included:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value isLINE_DELIMITER='\n'.FIELD_DELIMITER: an optional parameter that specifies the column delimiter of the CSV file. The default value isFIELD_DELIMITER='\t'.ESCAPE: specifies the escape character of the CSV file. It can be only one byte. The default value isESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: an optional parameter that specifies the symbol for enclosing field values in the CSV file. The default value is an empty string. This option is used to enclose only specific types of fields, such as CHAR, VARCHAR, TEXT, and JSON.ENCODING: specifies the character set encoding format of the file. If not specified, the default value isUTF8MB4.NULL_IF: specifies the string that is treated asNULL. The default value is an empty string.SKIP_HEADER: specifies the number of rows to skip from the header.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, indicating that blank lines are not skipped.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, indicating that leading and trailing spaces are not removed.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. The default value isFALSE, indicating that empty strings are not treated asNULL.PARSE_HEADER: specifies that the first row of the CSV file is directly obtained and used as the column names for each column.Notice
PARSE_HEADERcannot be used withSKIP_HEADERbecause they have conflicting semantics.
- When TYPE = 'PARQUET/ORC', no additional fields are included.
- When TYPE = 'CSV', the following fields are included:
The
PATTERNclause specifies a regular expression pattern to filter files in theLOCATIONdirectory. For each file path in theLOCATIONdirectory, if the path matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, the external table can access all files in theLOCATIONdirectory by default.
For the ODPS format, data is not retrieved through files, and there is no meaningful URL path. Therefore, only the source form of the table_function is supported.
- When TYPE = 'ODPS', the following fields are included:
ACCESSID: specifies the ID of the ODPS user.ACCESSKEY: specifies the password of the ODPS user.ENDPOINT: specifies the connection address of the ODPS service.TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.PROJECT_NAME: specifies the project where the queried table is located.SCHEMA_NAME: an optional parameter that specifies the schema of the queried table.TABLE_NAME: specifies the name of the queried table.QUOTA_NAME: an optional parameter that specifies whether to use the specified quota.COMPRESSION_CODE: an optional parameter that specifies the compression format of the data source. It supports ZLIB, ZSTD, LZ4, and ODPS_LZ4. If not specified, compression is not enabled.
Here is an example:
Read data in the ODPS format.
SELECT * FROM
source (
type = 'ODPS',
accessid = '$ODPS_ACCESSID',
accesskey = '$ODPS_ACCESSKEY',
endpoint= '$ODPS_ENDPOINT',
project_name = 'example_project',
schema_name = '',
table_name = 'example_table',
quota_name = '',
compression_code = ''
);
location_url
- The FORMAT clause specifies the file format-related properties. It supports the CSV, PARQUET, and ORC file formats:
- For CSV files: configure parse_header=true/false to specify whether to parse the first row as the column header (default is false). Additionally, use TYPE=CSV to declare the file format (required during export).
- For PARQUET and ORC files: the file structure is automatically recognized, and no additional configuration for the header row is required.
Here is an example:
SELECT * FROM
FILES( location = '/data/',
format (TYPE = 'csv', field_delimiter = ',', parse_header = true),
pattern = 'datafiles$';
group_by_condition
expression [ASC | DESC]: specifies the grouping condition. You can specify ASC or DESC to indicate whether the result set after grouping is sorted in ascending or descending order.
Here is an example:
Query col1 and col2 from table tbl1, group the result by col2, and compute the sum of col3.
SELECT col1, col2, SUM(col3)
FROM tbl1
GROUP BY col2 DESC;
The result is as follows:
+------+------+-----------+
| col1 | col2 | SUM(col3) |
+------+------+-----------+
| 3 | A3 | 3 |
| 2 | A2 | 2 |
| 1 | A1 | 1 |
+------+------+-----------+
3 rows in set
group_by_summary_option
GROUPING SETS: specifies the grouping method. You can specify multiple grouping combinations. This option has the following characteristics:- You can flexibly define the grouping combinations.
- The result set contains only the specified grouping combinations.
Note
GROUP BY GROUPING SETS ((col1), (col2), (col3))is equivalent toGROUP BY GROUPING SETS (col1, col2, col3).Here is an example:
SELECT col1, col2, col3, COUNT(*) FROM tbl1 GROUP BY GROUPING SETS ((col1), (col2), (col3));The result is as follows:
+------+------+------+----------+ | col1 | col2 | col3 | COUNT(*) | +------+------+------+----------+ | 1 | NULL | NULL | 1 | | 2 | NULL | NULL | 1 | | 3 | NULL | NULL | 1 | | NULL | A1 | NULL | 1 | | NULL | A2 | NULL | 1 | | NULL | A3 | NULL | 1 | | NULL | NULL | 1 | 1 | | NULL | NULL | 2 | 1 | | NULL | NULL | 3 | 1 | +------+------+------+----------+ 9 rows in setROLLUP: specifies the hierarchical aggregation method. The number of grouping columns is reduced layer by layer in the order of the grouping columns, and the result set contains multiple layers of aggregated data. This option has the following characteristics:The grouping columns are reduced in the order of the columns in
expression_listuntil only the total row is retained.The result set contains:
- Details of each group.
- Sub-aggregated data of each group.
- The total row.
ROLLUPcan be considered a special case ofGROUPING SETS. For example:GROUP BY ROLLUP(col1, col2, col3) is equivalent to GROUP BY GROUPING SETS ((col1, col2, col3), (col1, col2), (col1), ())ROLLUPcan be used withGROUPING SETS. For example:GROUP BY GROUPING SETS((col2), (col3), ROLLUP(col1, col2, col3)) is equivalent to GROUP BY GROUPING SETS((col2), (col3), (col1, col2, col3), (col1, col2), (col1), ())Here is an example:
SELECT col1, col2, col3, COUNT(*) FROM tbl1 GROUP BY ROLLUP (col1, col2, col3);The result is as follows:
+------+------+------+----------+ | col1 | col2 | col3 | COUNT(*) | +------+------+------+----------+ | 1 | A1 | 1 | 1 | | 1 | A1 | NULL | 1 | | 1 | NULL | NULL | 1 | | 2 | A2 | 2 | 1 | | 2 | A2 | NULL | 1 | | 2 | NULL | NULL | 1 | | 3 | A3 | 3 | 1 | | 3 | A3 | NULL | 1 | | 3 | NULL | NULL | 1 | | NULL | NULL | NULL | 3 | +------+------+------+----------+ 10 rows in setCUBE: specifies the multidimensional aggregation method. This option generates all possible grouping combinations. This option has the following characteristics:- Details of each group.
- Aggregated data of all possible grouping combinations.
- The total row.
CUBEcan be considered a special case ofGROUPING SETS. For example:GROUP BY CUBE(col1, col2, col3) is equivalent to GROUP BY GROUPING SETS ((col1, col2, col3), (col1, col2), (col1, col3), (col2, col3), (col1), (col2), (col3), ())CUBEcan be used withGROUPING SETS. For example:GROUP BY col1, CUBE(col2, col3), GROUPING SETS((col4), (col5)) is equivalent to GROUP BY GROUPING SETS( (col1, col2, col3, col4), (col1, col2, col3, col5), (col1, col2, col4), (col1, col2, col5), (col1, col3, col4), (col1, col3, col5), (col1, col4), (col1, col5))Here is an example:
SELECT col1, col2, col3, COUNT(*) FROM tbl1 GROUP BY CUBE (col1, col2, col3);The result is as follows:
+------+------+------+----------+ | col1 | col2 | col3 | COUNT(*) | +------+------+------+----------+ | NULL | NULL | NULL | 3 | | NULL | NULL | 1 | 1 | | NULL | NULL | 2 | 1 | | NULL | NULL | 3 | 1 | | NULL | A1 | NULL | 1 | | NULL | A2 | NULL | 1 | | NULL | A3 | NULL | 1 | | NULL | A1 | 1 | 1 | | NULL | A2 | 2 | 1 | | NULL | A3 | 3 | 1 | | 1 | NULL | NULL | 1 | | 2 | NULL | NULL | 1 | | 3 | NULL | NULL | 1 | | 1 | NULL | 1 | 1 | | 2 | NULL | 2 | 1 | | 3 | NULL | 3 | 1 | | 1 | A1 | NULL | 1 | | 2 | A2 | NULL | 1 | | 3 | A3 | NULL | 1 | | 1 | A1 | 1 | 1 | | 2 | A2 | 2 | 1 | | 3 | A3 | 3 | 1 | +------+------+------+----------+ 22 rows in set
having_condition
expression: specifies the condition for filtering the result set after grouping.
Here is an example:
Query col1 and col2 from table tbl1, group the result by col2, and compute the sum of col3. Query only the rows where the sum of col3 is less than 3.
SELECT col1, col2, SUM(col3)
FROM tbl1
GROUP BY col2
HAVING SUM(col3) < 3;
The result is as follows:
+------+------+-----------+
| col1 | col2 | SUM(col3) |
+------+------+-----------+
| 1 | A1 | 1 |
| 2 | A2 | 2 |
+------+------+-----------+
2 rows in set
order_by_condition
expression [ASC | DESC]: specifies the condition for sorting the result set.
ASC | DESC: optional. ASC specifies ascending order (default), and DESC specifies descending order.
Here is an example:
Query data from table tbl1 and output the query result in descending order (DESC) by col3.
SELECT * FROM tbl1
ORDER BY col3 DESC;
The result is as follows:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 3 | A3 | 3 |
| 2 | A2 | 2 |
| 1 | A1 | 1 |
+------+------+------+
3 rows in set
limit_clause
[offset,] row_count: the values of these parameters must be integer constants. The following table describes the parameters.offset: optional. The number of rows to skip from the start of the result set (offset). The default value is 0.row_count: the number of rows to return. If you specify onlyrow_count, the firstrow_countrows of the result set are returned.
row_count OFFSET offset: specifies that the firstoffsetrows of the result set are skipped and the nextrow_countrows are returned.
Here is an example:
Query data from table tbl1 and return two rows starting from the second row.
SELECT * FROM tbl1
LIMIT 1, 2;
The result is as follows:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 2 | A2 | 2 |
| 3 | A3 | 3 |
+------+------+------+
2 rows in set
lock_option
FOR UPDATE [opt_for_update_wait]: This option adds an exclusive lock to all rows in the query result set to prevent concurrent modifications by other transactions, or to prevent concurrent reads in certain transaction isolation levels. For more information about using exclusive locks, see Lock query results with SELECT FOR UPDATE.opt_for_update_wait: Optional. Specifies the behavior for acquiring locks. The options are as follows:WAIT {decimal | intnum}: Specifies the amount of time to wait for other transactions to release resources. If the time limit is exceeded, the resource cannot be obtained. The unit is seconds.NOWAIT/NO_WAIT: Specifies not to wait for other transactions to release resources and to immediately return that the resource cannot be obtained.SKIP LOCKED: Specifies not to wait for row locks. The query will execute immediately, and locked rows will be removed from the result set.
Here is an example:
You can use
SELECT ... FOR UPDATEto add row-level locks to a table. If theLIMIT 1clause is used, the optimizer will push the operator down to the table scan step, and only lock the rows returned byLIMIT.SELECT * FROM tbl1 LIMIT 1 FOR UPDATE;If you use the
ORDER BYclause to sort the query results, the results will be sorted first, and thenLIMIT 1will be executed. In this case, all selected rows will be locked.SELECT * FROM tbl1 ORDER BY col1 LIMIT 1 FOR UPDATE;
LOCK IN SHARE MODE: This option is used to acquire a shared lock when querying data, preventing other transactions from writing to the data while allowing other transactions to read from the data. For more information about using shared locks, see Lock query results with LOCK IN SHARE MODE.
Examples
Simple table query example
Create a table named
test_tbl1.CREATE TABLE test_tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT);Insert test data into the
test_tbl1table.INSERT INTO test_tbl1 VALUES (1, 'A1', 10),(2, 'A2', 15),(3, 'A1', 8);View the data in the
col2column of thetest_tbl1table and perform deduplication.SELECT DISTINCT col2 FROM test_tbl1;The returned result is as follows:
+------+ | col2 | +------+ | A1 | | A2 | +------+ 2 rows in set
Example of querying data using a dblink
Read data from the
number_ttable in the remotetestdatabase.SELECT * FROM number_t@ob_dblink;The returned result is as follows:
+--------+-------------+---------+---------------------+--------------------------------+ | c_int | c_bigint | c_float | c_double | c_decimal | +--------+-------------+---------+---------------------+--------------------------------+ | 214748 | 92233720368 | 3.40282 | 1.7976931348623157 | 123456789123456789.1234567890 | +--------+-------------+---------+---------------------+--------------------------------+ 1 row in setRead data from the
number_ttable in the remotetestdatabase and perform a JOIN operation with the localnumber_ttable.SELECT a.c_decimal, b.c_double FROM number_t a, number_t@ob_dblink b WHERE a.c_int = b.c_int;The returned result is as follows:
+--------------------------------+---------------------+ | c_decimal | c_double | +---------------------+--------------------------------+ | 123456789123456789.1234567890 | 1.7976931348623157 | +---------------------+--------------------------------+ 1 row in setRead data from the
datetime_ttable in the remotemysqldatabase.SELECT * FROM mysql.datetime_t@ob_dblink;The returned result is as follows:
+-------------+------------+---------+----------------------+----------------------+ | c_date | c_time | c_year | c_datetime | c_timestamp | +-------------+------------+---------+-----------------------+---------------------+ | 2023-04-13 | 12:12:12 | 2078 | 2100-11-01 12:12:13 | 2100-12-01 21:14:15 | +-------------+------------+---------+----------------------+----------------------+ 1 row in set
Example of vector distance query
Create a table.
CREATE TABLE vector_table ( id INT PRIMARY KEY, name VARCHAR(50), vector_col VECTOR(3) );Insert vector data into the table.
INSERT INTO vector_table VALUES (1, 'A', '[1, 2, 3]'); INSERT INTO vector_table VALUES (2, 'B', '[2, 3, 4]'); INSERT INTO vector_table VALUES (3, 'C', '[3, 4, 5]'); INSERT INTO vector_table VALUES (4, 'D', '[10, 20, 30]'); INSERT INTO vector_table VALUES (5, 'E', '[100, 200, 300]');Create a vector index.
CREATE VECTOR INDEX idx_vector ON vector_table(vector_col) WITH (distance=l2, type=hnsw);Perform a vector similarity query using the L2 distance.
SELECT id, name, vector_col FROM vector_table ORDER BY l2_distance(vector_col, '[2, 3, 4]') APPROXIMATE LIMIT 3;The returned result is as follows:
+----+------+-------------+ | id | name | vector_col | +----+------+-------------+ | 2 | B | [2,3,4] | | 3 | C | [3,4,5] | | 1 | A | [1,2,3] | +----+------+-------------+ 3 rows in setPerform a vector similarity query using the cosine distance.
SELECT id, name, vector_col FROM vector_table ORDER BY cosine_distance(vector_col, '[2, 3, 4]') APPROXIMATE LIMIT 3;The returned result is as follows:
+----+------+-------------+ | id | name | vector_col | +----+------+-------------+ | 2 | B | [2,3,4] | | 3 | C | [3,4,5] | | 1 | A | [1,2,3] | +----+------+-------------+ 3 rows in setPerform a vector query with a WHERE condition.
SELECT id, name, vector_col FROM vector_table WHERE id > 2 ORDER BY l2_distance(vector_col, '[2, 3, 4]') APPROXIMATE LIMIT 2;The returned result is as follows:
+----+------+-------------+ | id | name | vector_col | +----+------+-------------+ | 3 | C | [3,4,5] | | 4 | D | [10,20,30] | +----+------+-------------+ 2 rows in set
Notice
Vector distance queries must meet the following conditions:
- You must use the
APPROXIMATEkeyword to enable approximate search for the vector index. - The dimension of the query vector
query_vectormust match the dimension of the vector column. - We recommend that you create a vector index on the vector column to improve query performance.
- Supported vector distance functions include
l2_distance(L2 distance),negative_inner_product(negative inner product), andcosine_distance(cosine distance).