SELECT|V4.4.2|OceanBase Database| docs|Distributed Database

SELECT

Last Updated：2026-04-27 06:29:11 Updated

Purpose

This statement is used to query data from one or more tables.

This section mainly describes the general SELECT syntax. For other SELECT related syntax, see:

Privilege requirements

To execute the SELECT statement, the current user must have the SELECT privilege. For more information about the privileges of OceanBase Database, see Privilege types in MySQL mode.

Syntax

select_stmt:
    SELECT [hint_options]
        [{DISTINCT | UNIQUE} | ALL]
        [SQL_CALC_FOUND_ROWS]
        [STRAIGHT_JOIN]
        select_expr_list
        [FROM from_list]
        [WHERE where_condition]
        [GROUP BY group_by_condition_list [WITH ROLLUP]
         | GROUP BY [group_by_condition_list] group_by_summary_option (expression_list)]
        [HAVING having_condition]
        [window_clause]
        [ORDER BY order_by_condition_list]
        [LIMIT limit_clause]
        [lock_option];

select_expr_list:
    select_expr [, select_expr ...]

select_expr:
    *
    | table_name.{* | column_name}
    | table_alias_name.{* | column_name}
    | expr [[AS] column_alias_name]

from_list:
    DUAL
    | table_references
    | { location_url | table_function }

location_url:
  'file_path'
  (
  {FORMAT = (
      TYPE = 'CSV',
      LINE_DELIMITER = '<string>' | <expr>,
      FIELD_DELIMITER = '<string>' | <expr>,
      ESCAPE = '<character>' | <expr>,
      FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
      ENCODING = 'charset',
      NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
      SKIP_HEADER = <int>,
      SKIP_BLANK_LINES = { TRUE | FALSE },
      PARSE_HEADER = { TRUE | FALSE },
      TRIM_SPACE = { TRUE | FALSE },
      EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
    )
   | FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
  }
  [PATTERN = '<regex_pattern>']
  )

table_function:
  {
  FILES (
    LOCATION = {'file_path' | @location_name['/path']},
    {
      FORMAT = (
        TYPE = 'CSV',
        LINE_DELIMITER = '<string>' | <expr>,
        FIELD_DELIMITER = '<string>' | <expr>,
        ESCAPE = '<character>' | <expr>,
        FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
        ENCODING = 'charset',
        NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
        SKIP_HEADER = <int>,
        SKIP_BLANK_LINES = { TRUE | FALSE },
        PARSE_HEADER = { TRUE | FALSE },
        TRIM_SPACE = { TRUE | FALSE },
        EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
      )
      | FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
    },
    [PATTERN = '<regex_pattern>']
  )
  | SOURCE (
      TYPE = 'ODPS',
      ACCESSID = '<string>',
      ACCESSKEY = '<string>',
      ENDPOINT = '<string>',
      TUNNEL_ENDPOINT = '<string>',
      PROJECT_NAME = '<string>',
      SCHEMA_NAME = '<string>',
      TABLE_NAME = '<string>',
      QUOTA_NAME = '<string>',
      COMPRESSION_CODE = '<string>'
    )
  }

where_condition:
    expression

group_by_condition_list:
    group_by_condition [, group_by_condition ...]

group_by_condition:
    expression [ASC | DESC]

group_by_summary_option:
    GROUPING SETS
    | ROLLUP
    | CUBE

expression_list:
    expression [, expression ...]

having_condition:
    expression

order_by_condition_list:
    order_by_condition [, order_by_condition ...]

order_by_condition:
    expression [ASC | DESC]
    | vector_distance_function(vector_column, query_vector) [ASC | DESC] APPROXIMATE

limit_clause:
    [offset,] row_count
    | row_count OFFSET offset

lock_option:
    FOR UPDATE [opt_for_update_wait]
    | LOCK IN SHARE MODE

opt_for_update_wait:
    WAIT {decimal | intnum}
    | NOWAIT
    | NO_WAIT
    | SKIP LOCKED

Parameters

Parameter	Description
select_stmt	The SQL statement part used to specify which data to retrieve from the database.
hint_options	Optional. Used to specify the `hint` option. For more information about hints, see Optimizer Hint.
{DISTINCT \| UNIQUE} \| ALL	Optional. Used to control whether the result set contains duplicate rows. `DISTINCT`/`UNIQUE`: Indicates that the result set does not contain duplicate rows. `ALL`: Indicates that all rows, including duplicates, are returned. The default value is `ALL`.
SQL_CACHE \| SQL_NO_CACHE	Optional. Used to control whether the query result is cached. `SQL_CACHE`: Indicates that the query result is cached. `SQL_NO_CACHE`: Indicates that the query result is not cached.
SQL_CALC_FOUND_ROWS	Optional. Used to record the total number of returned rows in a query with a `LIMIT` clause, and to obtain the total number of rows using the `FOUND_ROWS()` function.
STRAIGHT_JOIN	Optional. Forces the optimizer to join tables in the order specified in the `FROM` clause. This command does not affect the execution result. Notice OceanBase Database does not fully support the `STRAIGHT_JOIN` syntax in MySQL. If `STRAIGHT_JOIN` can explicitly specify the order of table joins, OceanBase Database will execute it. If `STRAIGHT_JOIN` cannot determine a unique join order, OceanBase Database will ignore this command.
select_expr_list	The list of columns or expressions to display in the query result. For more information about the columns and expressions, see select_expr.
column_name	The name of a column.
FROM from_list	Optional. Specifies the data source for the query, which can be a table, view, or subquery. For more information about the data source, see from_list.
{ table_function \| location_url }	Optional. The current URL external table supports two syntax forms. For more information, see the following: table_function form. location_url form.
WHERE where_condition	Optional. Specifies the filter conditions for the query. For more information, see where_condition.
GROUP BY group_by_condition_list	Optional. Used to group the result set by the specified columns. Typically used with aggregate functions. For more information, see group_by_condition.
WITH ROLLUP	Optional. Used to summarize the groups to represent higher-level aggregations (also known as super-aggregations) and generate additional rows.
GROUP BY [group_by_condition_list] group_by_summary_option (expression_list)	Optional. Used for advanced grouping and summarization operations. `group_by_condition_list`: Optional. Specifies the columns or expressions by which to group. For more information, see group_by_condition. When `group_by_condition_list` is explicitly specified, these columns are considered fixed grouping conditions. They will be combined with the subsequent `group_by_summary_option` to form the final grouping logic. For example, `GROUP BY col1, GROUPING SETS ((col2), (col3))` is equivalent to `GROUP BY GROUPING SETS ((col1, col2), (col1, col3))`. If `group_by_condition_list` is not specified, all grouping logic is determined by `group_by_summary_option`. That is, the grouping conditions are entirely dependent on the columns or expressions in `expression_list`. `group_by_summary_option`: Specifies the advanced summarization option. For more information, see group_by_summary_option. `expression_list`: Specifies one or more column names, aliases, or expressions. Used to define the dimensions for summarization.
HAVING having_condition	Optional. Used to filter the grouped results. The `HAVING` clause is similar to the `WHERE` clause, but it can use aggregate functions such as `SUM` and `AVG`. For more information, see having_condition.
window_clause	Optional. Used to specify the window definition for the analytic function (also known as window function in some databases). For more information about the `window_clause` syntax, see WINDOW clause.
ORDER BY order_by_condition_list	Optional. Used to sort the result set. You can specify one or more columns for sorting. For more information, see order_by_condition.
LIMIT limit_clause	Optional. Used to specify the maximum number of rows to return. For more information, see limit_clause.
lock_option	Optional. Used to lock the query result. For more information, see lock_option.
vector_distance_function	The vector distance function. Supported functions are `l2_distance` (L2 distance) and `cosine_distance` (cosine distance).
vector_column	The name of the vector column. The column must be of the VECTOR type.
query_vector	The query vector, in the format `'[value1, value2, ...]'`. The dimension must match the vector column.
APPROXIMATE	A required keyword that enables approximate search for vector indexes to improve query performance.
inner_product	The inner product function, which represents the product of two vectors.

select_expr

select_expr: specifies the columns or expressions to be included in the query result. Multiple expressions or column names are separated by commas (,). The value can be as follows:

*: specifies all columns.

Here is an example:

Create tables tbl1 and tbl2.

CREATE TABLE tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT)
  PARTITION BY HASH(col1)
  PARTITIONS 5;

CREATE TABLE tbl2(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT);

Insert test data into table tbl1.

INSERT INTO tbl1 VALUES(1, 'A1', 1),(2, 'A2', 2),(3, 'A3', 3);

INSERT INTO tbl2 VALUES(1, 'A1', 1),(2, 'A2', 22),(3, 'A3', 33);

View the data in table tbl1.

SELECT * FROM tbl1;

The result is as follows:

+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
|    1 | A1   |    1 |
|    2 | A2   |    2 |
|    3 | A3   |    3 |
+------+------+------+
3 rows in set

table_name.{* | column_name}: specifies all columns or a specific column in the specified table.
- table_name.*: specifies all columns in the specified table.
- table_name.column_name: specifies a specific column in the specified table.
  
  Here is an example:
  
  Read the col1 data from table tbl1.
```
SELECT tbl1.col1 FROM tbl1;
```
  The result is as follows:
```
+------+
| col1 |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set
```
table_alias_name.{* | column_name}: similar to the previous format, but the table alias is used when selecting columns.
expr [[AS] column_alias_name]: specifies an expression and allows you to assign an alias to the selected expression as the column name.
- expr: specifies a column name, function, or calculation expression.
  - expr AS column_alias_name/expr column_alias_name: specifies the expression and assigns the alias column_alias_name to it.
Here is an example:

Read the col1 data from table tbl1, calculate a new column col1_add with the value col1+1, and calculate another new column col3_mul with the value col3*2.
```
SELECT col1, col1+1 AS col1_add, col3*2 col3_mul FROM tbl1;
```
The result is as follows:
```
+------+----------+----------+
| col1 | col1_add | col3_mul |
+------+----------+----------+
|    1 |        2 |        2 |
|    2 |        3 |        4 |
|    3 |        4 |        6 |
+------+----------+----------+
3 rows in set
```

from_list

DUAL: specifies a virtual table name. It is typically used to perform calculations or execute functions when there is no actual table.

Here is an example:

SELECT 1+1, SYSDATE() FROM DUAL;

The result is as follows:

+------+---------------------+
| 1+1  | SYSDATE()           |
+------+---------------------+
|    2 | 2024-08-28 15:20:59 |
+------+---------------------+
1 row in set

table_references: specifies the list of tables to be queried. Multiple tables can be included, separated by commas (,). This specifies the list of data source tables for the query, which can be a combination of one or more tables. For more information about the syntax of table_references, see JOIN clause.

where_condition

expression: specifies the condition expression used to filter the rows to be updated.

Here is an example:

Select all rows from tbl1 that satisfy the conditions col1 > 1 and col2 = 'A3'.

SELECT * FROM tbl1
  WHERE col1 > 1
  AND col2 = 'A3';

The result is as follows:

+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
|    3 | A3   |    3 |
+------+------+------+
1 row in set

table_function

The LOCATION clause specifies the path where the external table files are stored. Usually, the data files of the external table are stored in a separate directory, which can contain subdirectories. When the table is created, the external table automatically collects all the files in the directory. The values are as follows:
- file_path: specifies the path of the external table file. Details are as follows:
  - For a local LOCATION, the format is LOCATION = '[file://] local_file_path', where local_file_path can be a relative path or an absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database. secure_file_priv specifies the file paths that the OBServer nodes can access. local_file_path must be a subpath of the secure_file_priv path.
  - For a remote LOCATION, the format is:
    - LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required for accessing Alibaba Cloud OSS, AWS S3, and object storage compatible with the S3 protocol, and s3_region specifies the region selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.
    - LOCATION = hdfs://localhost:port/PATH, where localhost specifies the address of HDFS, port specifies the port number of HDFS, and PATH specifies the directory path in HDFS.
      - With Kerberos authentication: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Details are as follows:
        
        principal: specifies the login authentication user.
        
        keytab: specifies the path of the user authentication key file.
        
        krb5conf: specifies the path of the Kerberos environment description file.
        
        configs: specifies additional HDFS configuration items. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value, which needs to be configured. For example, dfs.data.transfer.protection=authentication,privacy specifies the data transmission protection level as authentication and privacy.
- @location_name['/path']: specifies the path of the Location object. ['/path'] is an optional parameter that specifies the subdirectory. For more information about the Location object, see CREATE LOCATION.
  
  Note
  
  For OceanBase Database V4.4.x, the @location_name['/path'] parameter is supported starting from V4.4.1.
The FORMAT clause specifies the file format related properties. The supported file formats are CSV, PARQUET, and ORC.
- TYPE = 'CSV': specifies the following fields:
  - LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value is LINE_DELIMITER='\n'.
  - FIELD_DELIMITER: an optional parameter that specifies the column delimiter of the CSV file. The default value is FIELD_DELIMITER='\t'.
  - ESCAPE: specifies the escape character of the CSV file. It can be only one byte. The default value is ESCAPE ='\'.
  - FIELD_OPTIONALLY_ENCLOSED_BY: an optional parameter that specifies the symbol used to enclose field values in the CSV file. The default value is empty. This option is used to enclose only certain types of fields (such as CHAR, VARCHAR, TEXT, and JSON).
  - ENCODING: specifies the character set encoding format of the file. If this parameter is not specified, the default value is UTF8MB4.
  - NULL_IF: specifies the string that is treated as NULL. The default value is empty.
  - SKIP_HEADER: specifies the number of header rows to skip.
  - SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE, which means that blank lines are not skipped.
  - TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, which means that leading and trailing spaces in fields are not removed.
  - EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings as NULL. The default value is FALSE, which means that empty strings are not treated as NULL.
  - PARSE_HEADER: specifies that the first row of the CSV file is directly obtained and used as the column names for each column.
    
    Notice
    
    You cannot use the PARSE_HEADER and SKIP_HEADER parameters at the same time.
- TYPE = 'PARQUET/ORC': no additional fields are specified.
The PATTERN clause specifies a regular expression pattern to filter files in the LOCATION directory. For each file path in the LOCATION directory, if the file path matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, all files in the LOCATION directory are accessible by default.

For the ODPS format, data is not obtained through files, and there is no meaningful URL path. Therefore, only the source form of the table_function is supported.

TYPE = 'ODPS': specifies the following fields:
- ACCESSID: specifies the ID of the ODPS user.
- ACCESSKEY: specifies the password of the ODPS user.
- ENDPOINT: specifies the connection address of the ODPS service.
- TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.
- PROJECT_NAME: specifies the project where the table to be queried is located.
- SCHEMA_NAME: an optional parameter that specifies the schema of the table to be queried.
- TABLE_NAME: specifies the name of the table to be queried.
- QUOTA_NAME: an optional parameter that specifies whether to use the specified quota.
- COMPRESSION_CODE: an optional parameter that specifies the compression format of the data source. The supported compression formats are ZLIB, ZSTD, LZ4, and ODPS_LZ4. If this parameter is not specified, compression is not enabled.

Here is an example:

Read data in the ODPS format.

SELECT * FROM
source (
  type = 'ODPS',
  accessid = '$ODPS_ACCESSID',
  accesskey = '$ODPS_ACCESSKEY',
  endpoint= '$ODPS_ENDPOINT',
  project_name = 'example_project',
  schema_name = '',
  table_name = 'example_table',
  quota_name = '',
  compression_code = ''
);

location_url

The FORMAT clause specifies the file format related properties. The supported file formats are CSV, PARQUET, and ORC:
- For CSV files: specify parse_header=true/false to indicate whether to parse the first row of the file as the column headers (default is false). Additionally, use TYPE=CSV to declare the file format (required when exporting data).
- For PARQUET or ORC files: the file structure is automatically recognized, and no additional configuration for the header row is required.

Here is an example:

SELECT * FROM
FILES( location = '/data/',
      format (TYPE = 'csv', field_delimiter = ',', parse_header = true),
      pattern = 'datafiles$';

group_by_condition

expression [ASC | DESC]：specifies the grouping condition. Adding ASC or DESC controls whether the result set is sorted in ascending or descending order after grouping.

Here is an example:

Query col1 and col2 from table tbl1, group by col2, and sum col3.

SELECT col1, col2, SUM(col3)
  FROM tbl1
  GROUP BY col2 DESC;

The result set is as follows:

+------+------+-----------+
| col1 | col2 | SUM(col3) |
+------+------+-----------+
|    3 | A3   |         3 |
|    2 | A2   |         2 |
|    1 | A1   |         1 |
+------+------+-----------+
3 rows in set

group_by_summary_option

GROUPING SETS：specifies custom grouping. You can specify multiple grouping combinations. It has the following characteristics:

You can flexibly define the grouping combinations.
The result set contains only the grouping combinations that you specify.

Note

GROUP BY GROUPING SETS ((col1), (col2), (col3)) is equivalent to GROUP BY GROUPING SETS (col1, col2, col3).

Here is an example:

SELECT col1, col2, col3, COUNT(*)
FROM tbl1
GROUP BY GROUPING SETS ((col1), (col2), (col3));

The result set is as follows:

+------+------+------+----------+
| col1 | col2 | col3 | COUNT(*) |
+------+------+------+----------+
|    1 | NULL | NULL |        1 |
|    2 | NULL | NULL |        1 |
|    3 | NULL | NULL |        1 |
| NULL | A1   | NULL |        1 |
| NULL | A2   | NULL |        1 |
| NULL | A3   | NULL |        1 |
| NULL | NULL |    1 |        1 |
| NULL | NULL |    2 |        1 |
| NULL | NULL |    3 |        1 |
+------+------+------+----------+
9 rows in set

ROLLUP：specifies hierarchical aggregation. It reduces the number of grouping columns in the order of the grouping columns and generates hierarchical aggregation results. It has the following characteristics:

It reduces the number of grouping columns in the order of the columns in expression_list until only the total row remains.
The result set contains the following data:
- The detailed data of each group.
- The sub-aggregated data of each group.
- The total row.

ROLLUP is a special case of GROUPING SETS. For example:

GROUP BY ROLLUP(col1, col2, col3)

is equivalent to

GROUP BY GROUPING SETS ((col1, col2, col3), (col1, col2), (col1), ())

You can also use ROLLUP with GROUPING SETS. For example:

GROUP BY GROUPING SETS((col2), (col3), ROLLUP(col1, col2, col3))

is equivalent to

GROUP BY GROUPING SETS((col2), (col3), (col1, col2, col3), (col1, col2), (col1), ())

Here is an example:

SELECT col1, col2, col3, COUNT(*)
FROM tbl1
GROUP BY ROLLUP (col1, col2, col3);

The result set is as follows:

+------+------+------+----------+
| col1 | col2 | col3 | COUNT(*) |
+------+------+------+----------+
|    1 | A1   |    1 |        1 |
|    1 | A1   | NULL |        1 |
|    1 | NULL | NULL |        1 |
|    2 | A2   |    2 |        1 |
|    2 | A2   | NULL |        1 |
|    2 | NULL | NULL |        1 |
|    3 | A3   |    3 |        1 |
|    3 | A3   | NULL |        1 |
|    3 | NULL | NULL |        1 |
| NULL | NULL | NULL |        3 |
+------+------+------+----------+
10 rows in set

CUBE：specifies multidimensional aggregation. It generates all possible grouping combinations. It has the following characteristics:

The detailed data of each group.
The aggregated data of all possible grouping combinations.
The total row.

CUBE is a special case of GROUPING SETS. For example:

GROUP BY CUBE(col1, col2, col3)

is equivalent to

GROUP BY GROUPING SETS ((col1, col2, col3), (col1, col2), (col1, col3), (col2, col3), (col1), (col2), (col3), ())

You can also use CUBE with GROUPING SETS. For example:

GROUP BY col1, CUBE(col2, col3), GROUPING SETS((col4), (col5))

is equivalent to

GROUP BY GROUPING SETS(
  (col1, col2, col3, col4), (col1, col2, col3, col5),
  (col1, col2, col4), (col1, col2, col5),
  (col1, col3, col4), (col1, col3, col5),
  (col1, col4), (col1, col5))

Here is an example:

SELECT col1, col2, col3, COUNT(*)
FROM tbl1
GROUP BY CUBE (col1, col2, col3);

The result set is as follows:

+------+------+------+----------+
| col1 | col2 | col3 | COUNT(*) |
+------+------+------+----------+
| NULL | NULL | NULL |        3 |
| NULL | NULL |    1 |        1 |
| NULL | NULL |    2 |        1 |
| NULL | NULL |    3 |        1 |
| NULL | A1   | NULL |        1 |
| NULL | A2   | NULL |        1 |
| NULL | A3   | NULL |        1 |
| NULL | A1   |    1 |        1 |
| NULL | A2   |    2 |        1 |
| NULL | A3   |    3 |        1 |
|    1 | NULL | NULL |        1 |
|    2 | NULL | NULL |        1 |
|    3 | NULL | NULL |        1 |
|    1 | NULL |    1 |        1 |
|    2 | NULL |    2 |        1 |
|    3 | NULL |    3 |        1 |
|    1 | A1   | NULL |        1 |
|    2 | A2   | NULL |        1 |
|    3 | A3   | NULL |        1 |
|    1 | A1   |    1 |        1 |
|    2 | A2   |    2 |        1 |
|    3 | A3   |    3 |        1 |
+------+------+------+----------+
22 rows in set

having_condition

expression：specifies the condition for filtering the result set after grouping.

Here is an example:

Query col1 and col2 from table tbl1, group by col2, and sum col3. Return only the rows where the sum of col3 is less than 3.

SELECT col1, col2, SUM(col3)
  FROM tbl1
  GROUP BY col2
  HAVING SUM(col3) < 3;

The result set is as follows:

+------+------+-----------+
| col1 | col2 | SUM(col3) |
+------+------+-----------+
|    1 | A1   |         1 |
|    2 | A2   |         2 |
+------+------+-----------+
2 rows in set

order_by_condition

expression [ASC | DESC]：specifies the condition for sorting the result set.

ASC | DESC：optional. Specifies the sort order. ASC specifies ascending order (default). DESC specifies descending order.

Here is an example:

Query data from table tbl1 and output the query result in descending order (DESC) based on col3.

SELECT * FROM tbl1
  ORDER BY col3 DESC;

The result set is as follows:

+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
|    3 | A3   |    3 |
|    2 | A2   |    2 |
|    1 | A1   |    1 |
+------+------+------+
3 rows in set

limit_clause

[offset,] row_count：must be an integer constant. The parameters are described as follows:
- offset：optional. Specifies the number of rows to skip from the start of the result set (offset). The default offset is 0.
- row_count：specifies the number of rows to return. If you specify only row_count, the first row_count rows of the result set are returned.
row_count OFFSET offset：specifies to skip offset rows from the start of the result set and return the next row_count rows.

Here is an example:

Query data from table tbl1, and return two rows starting from the second row by using LIMIT.

SELECT * FROM tbl1
  LIMIT 1, 2;

The result set is as follows:

+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
|    2 | A2   |    2 |
|    3 | A3   |    3 |
+------+------+------+
2 rows in set

lock_option

FOR UPDATE [opt_for_update_wait]: Specifies that exclusive locks are placed on all rows of the query result to prevent other transactions from modifying the data concurrently or reading the data concurrently at certain transaction isolation levels. For more information about exclusive locks, see Lock query results with SELECT FOR UPDATE.
- opt_for_update_wait: An optional parameter that specifies the behavior for acquiring locks. The options are as follows:
  - WAIT {decimal | intnum}: Specifies the amount of time to wait for other transactions to release resources. If the time limit is exceeded, the resources cannot be acquired. The unit is seconds.
  - NOWAIT/NO_WAIT: Specifies that no wait is performed for other transactions to release resources. The resources cannot be acquired.
  - SKIP LOCKED: Specifies that row locks are not waited for. The query is executed immediately, and locked rows are removed from the result set.
Here is an example:
- You can use SELECT ... FOR UPDATE to add row-level locks to a table. If the LIMIT 1 clause is used, the operator is pushed down to the table scan step during query optimization, and only the rows returned by LIMIT are locked.
```
SELECT * FROM tbl1
  LIMIT 1
  FOR UPDATE;
```
- If you use the ORDER BY clause to sort the query results, the results are sorted first, and then LIMIT 1 is executed. In this case, all selected rows are locked.
```
SELECT * FROM tbl1
  ORDER BY col1
  LIMIT 1
  FOR UPDATE;
```
LOCK IN SHARE MODE: Specifies that shared locks are acquired when data is queried to prevent other transactions from writing to the data, but allows other transactions to read the data. For more information about shared locks, see Lock query results with LOCK IN SHARE MODE.

Examples

Simple table query example

Create a table named test_tbl1.

CREATE TABLE test_tbl1(col1 INT PRIMARY KEY, col2 VARCHAR(20), col3 INT);

Insert test data into the test_tbl1 table.

INSERT INTO test_tbl1 VALUES (1, 'A1', 10),(2, 'A2', 15),(3, 'A1', 8);

View the data in the col2 column of the test_tbl1 table and perform deduplication.

SELECT DISTINCT col2 FROM test_tbl1;

The return result is as follows:

+------+
| col2 |
+------+
| A1   |
| A2   |
+------+
2 rows in set

Example of querying data using dblink

Read the number_t table from the remote test database.

SELECT * FROM number_t@ob_dblink;

The return result is as follows:

+--------+-------------+---------+---------------------+--------------------------------+
| c_int  | c_bigint    | c_float |    c_double         |   c_decimal                    |
+--------+-------------+---------+---------------------+--------------------------------+
| 214748 | 92233720368 | 3.40282 | 1.7976931348623157  | 123456789123456789.1234567890  |
+--------+-------------+---------+---------------------+--------------------------------+
1 row in set

Read the number_t table from the remote test database and perform a JOIN operation with the local number_t table.

SELECT a.c_decimal, b.c_double
  FROM number_t a, number_t@ob_dblink b
  WHERE a.c_int = b.c_int;

The return result is as follows:

+--------------------------------+---------------------+
|   c_decimal                    |    c_double         |
+---------------------+--------------------------------+
| 123456789123456789.1234567890  |  1.7976931348623157 |
+---------------------+--------------------------------+
1 row in set

Read the datetime_t table from the remote mysql database.

SELECT * FROM mysql.datetime_t@ob_dblink;

The return result is as follows:

+-------------+------------+---------+----------------------+----------------------+
| c_date      | c_time     | c_year  | c_datetime           |  c_timestamp         |
+-------------+------------+---------+-----------------------+---------------------+
|  2023-04-13 | 12:12:12   |  2078   | 2100-11-01 12:12:13  | 2100-12-01 21:14:15  |
+-------------+------------+---------+----------------------+----------------------+
1 row in set

Example of vector distance query

Create a table.

CREATE TABLE vector_table (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  vector_col VECTOR(3)
);

Insert vector data into the table.

INSERT INTO vector_table VALUES (1, 'A', '[1, 2, 3]');
INSERT INTO vector_table VALUES (2, 'B', '[2, 3, 4]');
INSERT INTO vector_table VALUES (3, 'C', '[3, 4, 5]');
INSERT INTO vector_table VALUES (4, 'D', '[10, 20, 30]');
INSERT INTO vector_table VALUES (5, 'E', '[100, 200, 300]');

Create a vector index.

CREATE VECTOR INDEX idx_vector ON vector_table(vector_col) WITH (distance=l2, type=hnsw);

Perform a vector similarity query using L2 distance.

SELECT id, name, vector_col
FROM vector_table
ORDER BY l2_distance(vector_col, '[2, 3, 4]') APPROXIMATE
LIMIT 3;

The return result is as follows:

+----+------+-------------+
| id | name | vector_col  |
+----+------+-------------+
|  2 | B    | [2,3,4]     |
|  3 | C    | [3,4,5]     |
|  1 | A    | [1,2,3]     |
+----+------+-------------+
3 rows in set

Perform a vector similarity query using cosine distance.

SELECT id, name, vector_col
FROM vector_table
ORDER BY cosine_distance(vector_col, '[2, 3, 4]') APPROXIMATE
LIMIT 3;

The return result is as follows:

+----+------+-------------+
| id | name | vector_col  |
+----+------+-------------+
|  2 | B    | [2,3,4]     |
|  3 | C    | [3,4,5]     |
|  1 | A    | [1,2,3]     |
+----+------+-------------+
3 rows in set

Perform a vector query with a WHERE condition.

SELECT id, name, vector_col
FROM vector_table
WHERE id > 2
ORDER BY l2_distance(vector_col, '[2, 3, 4]') APPROXIMATE
LIMIT 2;

The return result is as follows:

+----+------+-------------+
| id | name | vector_col  |
+----+------+-------------+
|  3 | C    | [3,4,5]     |
|  4 | D    | [10,20,30]  |
+----+------+-------------+
2 rows in set

Notice

The following conditions must be met for a vector distance query:

You must use the APPROXIMATE keyword to enable approximate search for the vector index.
The dimension of the query vector query_vector must match the dimension of the vector column.
We recommend that you create a corresponding vector index on the vector column to improve query performance.
Supported vector distance functions include l2_distance (L2 distance), negative_inner_product (negative inner product), and cosine_distance (cosine distance).

References

Single table query

OceanBase

Customer Stories

Documentation

SELECT

Purpose

Privilege requirements

Syntax

Parameters

Notice

select_expr

from_list

where_condition

table_function

Note

Notice

location_url

group_by_condition

group_by_summary_option

Note

having_condition

order_by_condition

limit_clause

lock_option

Examples

Simple table query example

Example of querying data using dblink

Example of vector distance query

Notice

References