Purpose
This statement is used to query data from one or more tables.
This section mainly describes the general SELECT syntax. For other SELECT related syntax, see:
Privilege requirements
To execute the SELECT statement, the current user must have the SELECT privilege. For more information about the privileges of OceanBase Database, see Privilege types in MySQL mode.
Syntax
select_stmt:
SELECT [hint_options]
[{DISTINCT | UNIQUE} | ALL]
[SQL_CALC_FOUND_ROWS]
[STRAIGHT_JOIN]
select_expr_list
[FROM from_list]
[WHERE where_condition]
[GROUP BY group_by_condition_list [WITH ROLLUP]
| GROUP BY [group_by_condition_list] group_by_summary_option (expression_list)]
[HAVING having_condition]
[window_clause]
[ORDER BY order_by_condition_list]
[LIMIT limit_clause]
[lock_option];
select_expr_list:
select_expr [, select_expr ...]
select_expr:
*
| table_name.{* | column_name}
| table_alias_name.{* | column_name}
| expr [[AS] column_alias_name]
from_list:
DUAL
| table_references
| { location_url | table_function }
location_url:
'file_path'
(
{FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'charset',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
PARSE_HEADER = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
| FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
}
[PATTERN = '<regex_pattern>']
)
table_function:
{
FILES (
LOCATION = {'file_path' | @location_name['/path']},
{
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'charset',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
PARSE_HEADER = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
| FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
},
[PATTERN = '<regex_pattern>']
)
| SOURCE (
TYPE = 'ODPS',
ACCESSID = '<string>',
ACCESSKEY = '<string>',
ENDPOINT = '<string>',
TUNNEL_ENDPOINT = '<string>',
PROJECT_NAME = '<string>',
SCHEMA_NAME = '<string>',
TABLE_NAME = '<string>',
QUOTA_NAME = '<string>',
COMPRESSION_CODE = '<string>'
)
}
where_condition:
expression
group_by_condition_list:
group_by_condition [, group_by_condition ...]
group_by_condition:
expression [ASC | DESC]
group_by_summary_option:
GROUPING SETS
| ROLLUP
| CUBE
expression_list:
expression [, expression ...]
having_condition:
expression
order_by_condition_list:
order_by_condition [, order_by_condition ...]
order_by_condition:
expression [ASC | DESC]
| vector_distance_function(vector_column, query_vector) [ASC | DESC] APPROXIMATE
limit_clause:
[offset,] row_count
| row_count OFFSET offset
lock_option:
FOR UPDATE [opt_for_update_wait]
| LOCK IN SHARE MODE
opt_for_update_wait:
WAIT {decimal | intnum}
| NOWAIT
| NO_WAIT
| SKIP LOCKED
Parameters
Parameter |
Description |
|---|---|
| select_stmt | The SQL statement part used to specify which data to retrieve from the database. |
| hint_options | Optional. Used to specify the hint option. For more information about hints, see Optimizer Hint. |
| {DISTINCT | UNIQUE} | ALL | Optional. Used to control whether the result set contains duplicate rows.
|
| SQL_CACHE | SQL_NO_CACHE | Optional. Used to control whether the query result is cached.
|
| SQL_CALC_FOUND_ROWS | Optional. Used to record the total number of returned rows in a query with a LIMIT clause, and to obtain the total number of rows using the FOUND_ROWS() function. |
| STRAIGHT_JOIN | Optional. Forces the optimizer to join tables in the order specified in the FROM clause. This command does not affect the execution result.
NoticeOceanBase Database does not fully support the |
| select_expr_list | The list of columns or expressions to display in the query result. For more information about the columns and expressions, see select_expr. |
| column_name | The name of a column. |
| FROM from_list | Optional. Specifies the data source for the query, which can be a table, view, or subquery. For more information about the data source, see from_list. |
| { table_function | location_url } | Optional. The current URL external table supports two syntax forms. For more information, see the following:
|
| WHERE where_condition | Optional. Specifies the filter conditions for the query. For more information, see where_condition. |
| GROUP BY group_by_condition_list | Optional. Used to group the result set by the specified columns. Typically used with aggregate functions. For more information, see group_by_condition. |
| WITH ROLLUP | Optional. Used to summarize the groups to represent higher-level aggregations (also known as super-aggregations) and generate additional rows. |
| GROUP BY [group_by_condition_list] group_by_summary_option (expression_list) | Optional. Used for advanced grouping and summarization operations.
|
| HAVING having_condition | Optional. Used to filter the grouped results. The HAVING clause is similar to the WHERE clause, but it can use aggregate functions such as SUM and AVG. For more information, see having_condition. |
| window_clause | Optional. Used to specify the window definition for the analytic function (also known as window function in some databases). For more information about the window_clause syntax, see WINDOW clause. |
| ORDER BY order_by_condition_list | Optional. Used to sort the result set. You can specify one or more columns for sorting. For more information, see order_by_condition. |
| LIMIT limit_clause | Optional. Used to specify the maximum number of rows to return. For more information, see limit_clause. |
| lock_option | Optional. Used to lock the query result. For more information, see lock_option. |
| vector_distance_function | The vector distance function. Supported functions are l2_distance (L2 distance) and cosine_distance (cosine distance). |
| vector_column | The name of the vector column. The column must be of the VECTOR type. |
| query_vector | The query vector, in the format '[value1, value2, ...]'. The dimension must match the vector column. |
| APPROXIMATE | A required keyword that enables approximate search for vector indexes to improve query performance. |
| inner_product | The inner product function, which represents the product of two vectors. |
select_expr
select_expr: specifies the columns or expressions to be included in the query result. Multiple expressions or column names are separated by commas (,). The value can be as follows:
*: specifies all columns.Here is an example:
Create tables
tbl1andtbl2.CREATE TABLE tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT) PARTITION BY HASH(col1) PARTITIONS 5;CREATE TABLE tbl2(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT);Insert test data into table
tbl1.INSERT INTO tbl1 VALUES(1, 'A1', 1),(2, 'A2', 2),(3, 'A3', 3);INSERT INTO tbl2 VALUES(1, 'A1', 1),(2, 'A2', 22),(3, 'A3', 33);View the data in table
tbl1.SELECT * FROM tbl1;The result is as follows:
+------+------+------+ | col1 | col2 | col3 | +------+------+------+ | 1 | A1 | 1 | | 2 | A2 | 2 | | 3 | A3 | 3 | +------+------+------+ 3 rows in set
table_name.{* | column_name}: specifies all columns or a specific column in the specified table.table_name.*: specifies all columns in the specified table.table_name.column_name: specifies a specific column in the specified table.Here is an example:
Read the
col1data from tabletbl1.SELECT tbl1.col1 FROM tbl1;The result is as follows:
+------+ | col1 | +------+ | 1 | | 2 | | 3 | +------+ 3 rows in set
table_alias_name.{* | column_name}: similar to the previous format, but the table alias is used when selecting columns.expr [[AS] column_alias_name]: specifies an expression and allows you to assign an alias to the selected expression as the column name.expr: specifies a column name, function, or calculation expression.expr AS column_alias_name/expr column_alias_name: specifies the expression and assigns the aliascolumn_alias_nameto it.
Here is an example:
Read the
col1data from tabletbl1, calculate a new columncol1_addwith the valuecol1+1, and calculate another new columncol3_mulwith the valuecol3*2.SELECT col1, col1+1 AS col1_add, col3*2 col3_mul FROM tbl1;The result is as follows:
+------+----------+----------+ | col1 | col1_add | col3_mul | +------+----------+----------+ | 1 | 2 | 2 | | 2 | 3 | 4 | | 3 | 4 | 6 | +------+----------+----------+ 3 rows in set
from_list
DUAL: specifies a virtual table name. It is typically used to perform calculations or execute functions when there is no actual table.Here is an example:
SELECT 1+1, SYSDATE() FROM DUAL;The result is as follows:
+------+---------------------+ | 1+1 | SYSDATE() | +------+---------------------+ | 2 | 2024-08-28 15:20:59 | +------+---------------------+ 1 row in settable_references: specifies the list of tables to be queried. Multiple tables can be included, separated by commas (,). This specifies the list of data source tables for the query, which can be a combination of one or more tables. For more information about the syntax oftable_references, see JOIN clause.
where_condition
expression: specifies the condition expression used to filter the rows to be updated.
Here is an example:
Select all rows from tbl1 that satisfy the conditions col1 > 1 and col2 = 'A3'.
SELECT * FROM tbl1
WHERE col1 > 1
AND col2 = 'A3';
The result is as follows:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 3 | A3 | 3 |
+------+------+------+
1 row in set
table_function
The LOCATION clause specifies the path where the external table files are stored. Usually, the data files of the external table are stored in a separate directory, which can contain subdirectories. When the table is created, the external table automatically collects all the files in the directory. The values are as follows:
file_path: specifies the path of the external table file. Details are as follows:For a local
LOCATION, the format isLOCATION = '[file://] local_file_path', wherelocal_file_pathcan be a relative path or an absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database.secure_file_privspecifies the file paths that the OBServer nodes can access.local_file_pathmust be a subpath of thesecure_file_privpath.For a remote
LOCATION, the format is:LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required for accessing Alibaba Cloud OSS, AWS S3, and object storage compatible with the S3 protocol, ands3_regionspecifies the region selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.LOCATION = hdfs://localhost:port/PATH, wherelocalhostspecifies the address of HDFS,portspecifies the port number of HDFS, andPATHspecifies the directory path in HDFS.With Kerberos authentication:
LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Details are as follows:principal: specifies the login authentication user.keytab: specifies the path of the user authentication key file.krb5conf: specifies the path of the Kerberos environment description file.configs: specifies additional HDFS configuration items. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value, which needs to be configured. For example,dfs.data.transfer.protection=authentication,privacyspecifies the data transmission protection level asauthenticationandprivacy.
@location_name['/path']: specifies the path of the Location object.['/path']is an optional parameter that specifies the subdirectory. For more information about the Location object, see CREATE LOCATION.Note
For OceanBase Database V4.4.x, the
@location_name['/path']parameter is supported starting from V4.4.1.
The FORMAT clause specifies the file format related properties. The supported file formats are CSV, PARQUET, and ORC.
- TYPE = 'CSV': specifies the following fields:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value isLINE_DELIMITER='\n'.FIELD_DELIMITER: an optional parameter that specifies the column delimiter of the CSV file. The default value isFIELD_DELIMITER='\t'.ESCAPE: specifies the escape character of the CSV file. It can be only one byte. The default value isESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: an optional parameter that specifies the symbol used to enclose field values in the CSV file. The default value is empty. This option is used to enclose only certain types of fields (such as CHAR, VARCHAR, TEXT, and JSON).ENCODING: specifies the character set encoding format of the file. If this parameter is not specified, the default value isUTF8MB4.NULL_IF: specifies the string that is treated asNULL. The default value is empty.SKIP_HEADER: specifies the number of header rows to skip.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, which means that blank lines are not skipped.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, which means that leading and trailing spaces in fields are not removed.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. The default value isFALSE, which means that empty strings are not treated asNULL.PARSE_HEADER: specifies that the first row of the CSV file is directly obtained and used as the column names for each column.Notice
You cannot use the
PARSE_HEADERandSKIP_HEADERparameters at the same time.
- TYPE = 'PARQUET/ORC': no additional fields are specified.
- TYPE = 'CSV': specifies the following fields:
The PATTERN clause specifies a regular expression pattern to filter files in the
LOCATIONdirectory. For each file path in theLOCATIONdirectory, if the file path matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, all files in theLOCATIONdirectory are accessible by default.
For the ODPS format, data is not obtained through files, and there is no meaningful URL path. Therefore, only the source form of the table_function is supported.
- TYPE = 'ODPS': specifies the following fields:
ACCESSID: specifies the ID of the ODPS user.ACCESSKEY: specifies the password of the ODPS user.ENDPOINT: specifies the connection address of the ODPS service.TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.PROJECT_NAME: specifies the project where the table to be queried is located.SCHEMA_NAME: an optional parameter that specifies the schema of the table to be queried.TABLE_NAME: specifies the name of the table to be queried.QUOTA_NAME: an optional parameter that specifies whether to use the specified quota.COMPRESSION_CODE: an optional parameter that specifies the compression format of the data source. The supported compression formats are ZLIB, ZSTD, LZ4, and ODPS_LZ4. If this parameter is not specified, compression is not enabled.
Here is an example:
Read data in the ODPS format.
SELECT * FROM
source (
type = 'ODPS',
accessid = '$ODPS_ACCESSID',
accesskey = '$ODPS_ACCESSKEY',
endpoint= '$ODPS_ENDPOINT',
project_name = 'example_project',
schema_name = '',
table_name = 'example_table',
quota_name = '',
compression_code = ''
);
location_url
- The FORMAT clause specifies the file format related properties. The supported file formats are CSV, PARQUET, and ORC:
- For CSV files: specify parse_header=true/false to indicate whether to parse the first row of the file as the column headers (default is false). Additionally, use TYPE=CSV to declare the file format (required when exporting data).
- For PARQUET or ORC files: the file structure is automatically recognized, and no additional configuration for the header row is required.
Here is an example:
SELECT * FROM
FILES( location = '/data/',
format (TYPE = 'csv', field_delimiter = ',', parse_header = true),
pattern = 'datafiles$';
group_by_condition
expression [ASC | DESC]:specifies the grouping condition. Adding ASC or DESC controls whether the result set is sorted in ascending or descending order after grouping.
Here is an example:
Query col1 and col2 from table tbl1, group by col2, and sum col3.
SELECT col1, col2, SUM(col3)
FROM tbl1
GROUP BY col2 DESC;
The result set is as follows:
+------+------+-----------+
| col1 | col2 | SUM(col3) |
+------+------+-----------+
| 3 | A3 | 3 |
| 2 | A2 | 2 |
| 1 | A1 | 1 |
+------+------+-----------+
3 rows in set
group_by_summary_option
GROUPING SETS:specifies custom grouping. You can specify multiple grouping combinations. It has the following characteristics:- You can flexibly define the grouping combinations.
- The result set contains only the grouping combinations that you specify.
Note
GROUP BY GROUPING SETS ((col1), (col2), (col3))is equivalent toGROUP BY GROUPING SETS (col1, col2, col3).Here is an example:
SELECT col1, col2, col3, COUNT(*) FROM tbl1 GROUP BY GROUPING SETS ((col1), (col2), (col3));The result set is as follows:
+------+------+------+----------+ | col1 | col2 | col3 | COUNT(*) | +------+------+------+----------+ | 1 | NULL | NULL | 1 | | 2 | NULL | NULL | 1 | | 3 | NULL | NULL | 1 | | NULL | A1 | NULL | 1 | | NULL | A2 | NULL | 1 | | NULL | A3 | NULL | 1 | | NULL | NULL | 1 | 1 | | NULL | NULL | 2 | 1 | | NULL | NULL | 3 | 1 | +------+------+------+----------+ 9 rows in setROLLUP:specifies hierarchical aggregation. It reduces the number of grouping columns in the order of the grouping columns and generates hierarchical aggregation results. It has the following characteristics:It reduces the number of grouping columns in the order of the columns in
expression_listuntil only the total row remains.The result set contains the following data:
- The detailed data of each group.
- The sub-aggregated data of each group.
- The total row.
ROLLUPis a special case ofGROUPING SETS. For example:GROUP BY ROLLUP(col1, col2, col3) is equivalent to GROUP BY GROUPING SETS ((col1, col2, col3), (col1, col2), (col1), ())You can also use
ROLLUPwithGROUPING SETS. For example:GROUP BY GROUPING SETS((col2), (col3), ROLLUP(col1, col2, col3)) is equivalent to GROUP BY GROUPING SETS((col2), (col3), (col1, col2, col3), (col1, col2), (col1), ())Here is an example:
SELECT col1, col2, col3, COUNT(*) FROM tbl1 GROUP BY ROLLUP (col1, col2, col3);The result set is as follows:
+------+------+------+----------+ | col1 | col2 | col3 | COUNT(*) | +------+------+------+----------+ | 1 | A1 | 1 | 1 | | 1 | A1 | NULL | 1 | | 1 | NULL | NULL | 1 | | 2 | A2 | 2 | 1 | | 2 | A2 | NULL | 1 | | 2 | NULL | NULL | 1 | | 3 | A3 | 3 | 1 | | 3 | A3 | NULL | 1 | | 3 | NULL | NULL | 1 | | NULL | NULL | NULL | 3 | +------+------+------+----------+ 10 rows in setCUBE:specifies multidimensional aggregation. It generates all possible grouping combinations. It has the following characteristics:- The detailed data of each group.
- The aggregated data of all possible grouping combinations.
- The total row.
CUBEis a special case ofGROUPING SETS. For example:GROUP BY CUBE(col1, col2, col3) is equivalent to GROUP BY GROUPING SETS ((col1, col2, col3), (col1, col2), (col1, col3), (col2, col3), (col1), (col2), (col3), ())You can also use
CUBEwithGROUPING SETS. For example:GROUP BY col1, CUBE(col2, col3), GROUPING SETS((col4), (col5)) is equivalent to GROUP BY GROUPING SETS( (col1, col2, col3, col4), (col1, col2, col3, col5), (col1, col2, col4), (col1, col2, col5), (col1, col3, col4), (col1, col3, col5), (col1, col4), (col1, col5))Here is an example:
SELECT col1, col2, col3, COUNT(*) FROM tbl1 GROUP BY CUBE (col1, col2, col3);The result set is as follows:
+------+------+------+----------+ | col1 | col2 | col3 | COUNT(*) | +------+------+------+----------+ | NULL | NULL | NULL | 3 | | NULL | NULL | 1 | 1 | | NULL | NULL | 2 | 1 | | NULL | NULL | 3 | 1 | | NULL | A1 | NULL | 1 | | NULL | A2 | NULL | 1 | | NULL | A3 | NULL | 1 | | NULL | A1 | 1 | 1 | | NULL | A2 | 2 | 1 | | NULL | A3 | 3 | 1 | | 1 | NULL | NULL | 1 | | 2 | NULL | NULL | 1 | | 3 | NULL | NULL | 1 | | 1 | NULL | 1 | 1 | | 2 | NULL | 2 | 1 | | 3 | NULL | 3 | 1 | | 1 | A1 | NULL | 1 | | 2 | A2 | NULL | 1 | | 3 | A3 | NULL | 1 | | 1 | A1 | 1 | 1 | | 2 | A2 | 2 | 1 | | 3 | A3 | 3 | 1 | +------+------+------+----------+ 22 rows in set
having_condition
expression:specifies the condition for filtering the result set after grouping.
Here is an example:
Query col1 and col2 from table tbl1, group by col2, and sum col3. Return only the rows where the sum of col3 is less than 3.
SELECT col1, col2, SUM(col3)
FROM tbl1
GROUP BY col2
HAVING SUM(col3) < 3;
The result set is as follows:
+------+------+-----------+
| col1 | col2 | SUM(col3) |
+------+------+-----------+
| 1 | A1 | 1 |
| 2 | A2 | 2 |
+------+------+-----------+
2 rows in set
order_by_condition
expression [ASC | DESC]:specifies the condition for sorting the result set.
ASC | DESC:optional. Specifies the sort order. ASC specifies ascending order (default). DESC specifies descending order.
Here is an example:
Query data from table tbl1 and output the query result in descending order (DESC) based on col3.
SELECT * FROM tbl1
ORDER BY col3 DESC;
The result set is as follows:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 3 | A3 | 3 |
| 2 | A2 | 2 |
| 1 | A1 | 1 |
+------+------+------+
3 rows in set
limit_clause
[offset,] row_count:must be an integer constant. The parameters are described as follows:offset:optional. Specifies the number of rows to skip from the start of the result set (offset). The default offset is 0.row_count:specifies the number of rows to return. If you specify onlyrow_count, the firstrow_countrows of the result set are returned.
row_count OFFSET offset:specifies to skipoffsetrows from the start of the result set and return the nextrow_countrows.
Here is an example:
Query data from table tbl1, and return two rows starting from the second row by using LIMIT.
SELECT * FROM tbl1
LIMIT 1, 2;
The result set is as follows:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 2 | A2 | 2 |
| 3 | A3 | 3 |
+------+------+------+
2 rows in set
lock_option
FOR UPDATE [opt_for_update_wait]: Specifies that exclusive locks are placed on all rows of the query result to prevent other transactions from modifying the data concurrently or reading the data concurrently at certain transaction isolation levels. For more information about exclusive locks, see Lock query results with SELECT FOR UPDATE.opt_for_update_wait: An optional parameter that specifies the behavior for acquiring locks. The options are as follows:WAIT {decimal | intnum}: Specifies the amount of time to wait for other transactions to release resources. If the time limit is exceeded, the resources cannot be acquired. The unit is seconds.NOWAIT/NO_WAIT: Specifies that no wait is performed for other transactions to release resources. The resources cannot be acquired.SKIP LOCKED: Specifies that row locks are not waited for. The query is executed immediately, and locked rows are removed from the result set.
Here is an example:
You can use
SELECT ... FOR UPDATEto add row-level locks to a table. If theLIMIT 1clause is used, the operator is pushed down to the table scan step during query optimization, and only the rows returned byLIMITare locked.SELECT * FROM tbl1 LIMIT 1 FOR UPDATE;If you use the
ORDER BYclause to sort the query results, the results are sorted first, and thenLIMIT 1is executed. In this case, all selected rows are locked.SELECT * FROM tbl1 ORDER BY col1 LIMIT 1 FOR UPDATE;
LOCK IN SHARE MODE: Specifies that shared locks are acquired when data is queried to prevent other transactions from writing to the data, but allows other transactions to read the data. For more information about shared locks, see Lock query results with LOCK IN SHARE MODE.
Examples
Simple table query example
Create a table named
test_tbl1.CREATE TABLE test_tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT);Insert test data into the
test_tbl1table.INSERT INTO test_tbl1 VALUES (1, 'A1', 10),(2, 'A2', 15),(3, 'A1', 8);View the data in the
col2column of thetest_tbl1table and perform deduplication.SELECT DISTINCT col2 FROM test_tbl1;The return result is as follows:
+------+ | col2 | +------+ | A1 | | A2 | +------+ 2 rows in set
Example of querying data using dblink
Read the
number_ttable from the remotetestdatabase.SELECT * FROM number_t@ob_dblink;The return result is as follows:
+--------+-------------+---------+---------------------+--------------------------------+ | c_int | c_bigint | c_float | c_double | c_decimal | +--------+-------------+---------+---------------------+--------------------------------+ | 214748 | 92233720368 | 3.40282 | 1.7976931348623157 | 123456789123456789.1234567890 | +--------+-------------+---------+---------------------+--------------------------------+ 1 row in setRead the
number_ttable from the remotetestdatabase and perform a JOIN operation with the localnumber_ttable.SELECT a.c_decimal, b.c_double FROM number_t a, number_t@ob_dblink b WHERE a.c_int = b.c_int;The return result is as follows:
+--------------------------------+---------------------+ | c_decimal | c_double | +---------------------+--------------------------------+ | 123456789123456789.1234567890 | 1.7976931348623157 | +---------------------+--------------------------------+ 1 row in setRead the
datetime_ttable from the remotemysqldatabase.SELECT * FROM mysql.datetime_t@ob_dblink;The return result is as follows:
+-------------+------------+---------+----------------------+----------------------+ | c_date | c_time | c_year | c_datetime | c_timestamp | +-------------+------------+---------+-----------------------+---------------------+ | 2023-04-13 | 12:12:12 | 2078 | 2100-11-01 12:12:13 | 2100-12-01 21:14:15 | +-------------+------------+---------+----------------------+----------------------+ 1 row in set
Example of vector distance query
Create a table.
CREATE TABLE vector_table ( id INT PRIMARY KEY, name VARCHAR(50), vector_col VECTOR(3) );Insert vector data into the table.
INSERT INTO vector_table VALUES (1, 'A', '[1, 2, 3]'); INSERT INTO vector_table VALUES (2, 'B', '[2, 3, 4]'); INSERT INTO vector_table VALUES (3, 'C', '[3, 4, 5]'); INSERT INTO vector_table VALUES (4, 'D', '[10, 20, 30]'); INSERT INTO vector_table VALUES (5, 'E', '[100, 200, 300]');Create a vector index.
CREATE VECTOR INDEX idx_vector ON vector_table(vector_col) WITH (distance=l2, type=hnsw);Perform a vector similarity query using L2 distance.
SELECT id, name, vector_col FROM vector_table ORDER BY l2_distance(vector_col, '[2, 3, 4]') APPROXIMATE LIMIT 3;The return result is as follows:
+----+------+-------------+ | id | name | vector_col | +----+------+-------------+ | 2 | B | [2,3,4] | | 3 | C | [3,4,5] | | 1 | A | [1,2,3] | +----+------+-------------+ 3 rows in setPerform a vector similarity query using cosine distance.
SELECT id, name, vector_col FROM vector_table ORDER BY cosine_distance(vector_col, '[2, 3, 4]') APPROXIMATE LIMIT 3;The return result is as follows:
+----+------+-------------+ | id | name | vector_col | +----+------+-------------+ | 2 | B | [2,3,4] | | 3 | C | [3,4,5] | | 1 | A | [1,2,3] | +----+------+-------------+ 3 rows in setPerform a vector query with a WHERE condition.
SELECT id, name, vector_col FROM vector_table WHERE id > 2 ORDER BY l2_distance(vector_col, '[2, 3, 4]') APPROXIMATE LIMIT 2;The return result is as follows:
+----+------+-------------+ | id | name | vector_col | +----+------+-------------+ | 3 | C | [3,4,5] | | 4 | D | [10,20,30] | +----+------+-------------+ 2 rows in set
Notice
The following conditions must be met for a vector distance query:
- You must use the
APPROXIMATEkeyword to enable approximate search for the vector index. - The dimension of the query vector
query_vectormust match the dimension of the vector column. - We recommend that you create a corresponding vector index on the vector column to improve query performance.
- Supported vector distance functions include
l2_distance(L2 distance),negative_inner_product(negative inner product), andcosine_distance(cosine distance).
