SELECT INTO|V4.4.2|OceanBase Database| docs|Distributed Database

Purpose

This statement is used to store the query result in a variable or file. Specifically:

SELECT ... INTO OUTFILE is used to store the result set in an external file and specify the output format.
SELECT ... INTO DUMPFILE is used to write an unformatted single line to an external file.
SELECT ... INTO var_list is used to store the result set in a variable.

Privilege requirements

To execute the SELECT INTO statement, you must have the SELECT privilege on the target table. For more information about the privileges in OceanBase Database, see Privilege types in Oracle-compatible mode.

Syntax

SELECT [/*+parallel(N)*/] column_list_option
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
     | OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
     | DUMPFILE 'file_name'
     | into_var_list}
FROM table_name_list
[WHERE where_conditions]
[GROUP BY group_by_list [HAVING having_search_conditions]]
[ORDER BY order_expression_list];

column_list_option:
    column_name [, column_name ...]

field_opt:
    {COLUMNS | FIELDS} field_term_list

field_term_list:
  field_term [, field_term ...]

field_term:
    {[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string

line_opt:
    LINES line_term_list

line_term_list:
    line_term [line_term]

line_term:
    {STARTING | TERMINATED} BY string

file_opt:
    file_option [, file_option ...]

file_option:
    SINGLE [=] {TRUE | FALSE}
    | MAX_FILE_SIZE [=] {int | string}
    | BUFFER_SIZE [=] {int | string}

external_file_format_list:
    type_csv_option
    | type_parquet_option
    | type_orc_option

type_csv_option:
    TYPE = 'CSV'
    LINE_DELIMITER = '<string>' | <expr>
    FIELD_DELIMITER = '<string>' | <expr>
    ESCAPE = '<character>' | <expr>
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
    FIELD_ENCLOSED_BY = '<character>' | <expr>
    ENCODING = 'charset'
    COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
    FILE_EXTENSION = ['<string>']

type_parquet_option:
    TYPE = 'PARQUET',
    COMPRESSION = '<string>'
    ROW_GROUP_SIZE = '<string>' | <int>

type_orc_option:
    TYPE = 'ORC'
    COMPRESSION = '<string>'
    COMPRESSION_BLOCK_SIZE = '<string>' | <int>
    STRIPE_SIZE = '<string>' | <int>
    ROW_INDEX_STRIDE = <int>

Parameters

Parameter	Description
parallel(N)	Optional. Specifies the degree of parallelism for the statement.
column_list_option	Specifies the columns to export. To export all data, use `*`. `column_name`: the name of the column. For more information about column options in a query statement, see SIMPLE SELECT.
file_name	Specifies the path and name of the export file. For more information, see file_name below.
PARTITION BY part_expr	Optional. Specifies the partitioning method for the exported data. The value of `part_expr` is used as part of the export path. For each row, the value of `part_expr` is calculated, and rows with the same value of `part_expr` are exported to the same directory. Notice When exporting data by partition, `SINGLE = FALSE` must be specified, allowing multiple files to be exported. Currently, exporting data by partition is only supported for importing to OSS.
FORMAT = (external_file_format_list)	Optional. The FORMAT clause specifies the export file format. Use TYPE to specify the CSV, PARQUET, or ORC file format. TYPE cannot be empty. For more information, see external_file_format_list below.
CHARSET \| CHARACTER SET charset_name	Optional. Specifies the character set for the exported file. `charset_name` specifies the name of the character set.
field_opt	Optional. Specifies the field format options for the export. Use `FIELDS` or `COLUMNS` to specify the format of each field in the output file. For more information, see field_term below.
line_opt	Optional. Specifies the start and end delimiters for data rows in the export. Use the `LINES` clause to set the start and end characters for each row. For more information, see line_term below.
file_opt	Optional. Specifies whether to export to multiple files and the size of each file when exporting to multiple files. For more information, see file_option below.
FROM table_name_list	Specifies the data source.
WHERE where_conditions	Optional. Specifies the filter conditions. Only data that meets the conditions is included in the query results. For more information about filter conditions in a query statement, see SIMPLE SELECT.
GROUP BY group_by_list	Optional. Specifies the fields for grouping. Typically used with aggregate functions. Note All columns in the `SELECT` clause that are not used with aggregate functions must appear in the `GROUP BY` clause.
HAVING having_search_conditions	Optional. Filters the grouped data. The `HAVING` clause is similar to the `WHERE` clause, but it can use aggregate functions such as `SUM` and `AVG`.
ORDER BY order_expression_list	Optional. Specifies the columns to sort the result set by `ASC` or `DESC`. If neither `ASC` nor `DESC` is specified, the default is `ASC`. `ASC`: ascending order. `DESC`: descending order.

file_name

file_name has the following format:

When you save the export file to an OBServer node, the format is: /\$PATH/\$FILENAME, where the parameters are described as follows:
- \$PATH: specifies the path to save the export file, that is, the path of the export file on the OBServer node.
- \$FILENAME: specifies the name of the export file. When SINGLE = FALSE, it specifies the prefix of the export file. If you do not specify this parameter, the system generates a default prefix data. The system automatically generates the suffix.
When you save the export file to Alibaba Cloud Object Storage Service (OSS), the format is: oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY, where the parameters are described as follows:
- \$PATH: specifies the path to save the export file, that is, the path of the file in the bucket.
- \$FILENAME: specifies the name of the export file. When SINGLE = FALSE, it specifies the prefix of the export file. If you do not specify this parameter, the system generates a default prefix data. The system automatically generates the suffix.
- \$HOST: specifies the host name or the domain name of the CDN-accelerated service for accessing the OSS service.
- \$ACCESS_ID: specifies the Access Key ID required to access the OSS service for identity verification.
- \$ACCESSKEY: specifies the Access Key Secret required to access the OSS service for identity verification.
Note
- OceanBase Database supports exporting data to S3 and object storage services based on the S3 protocol.
- Alibaba Cloud OSS has a file size limit. When you export a file that exceeds 5 GB to OSS, the system splits the file into multiple files, each of which is less than 5 GB.
When you save the export file to Hadoop Distributed File System (HDFS), the format is:

Notice

When you use the write capability of HDFS external tables, OceanBase Database must be deployed in an environment that supports the Java SDK. For more information about how to deploy the Java SDK environment, see Deploy the OceanBase Database Java SDK environment.
- The export format for SIMPLE authentication:
  - Basic authentication (without Kerberos):
```
hdfs://\$NAMENODE:\$PORT/\$PATH/\$FILENAME[?username=\$USERNAME]
```
  - Description:
    - When you use basic authentication (SIMPLE), you can specify the HDFS username by using the username parameter.
    - If you do not specify the username parameter, the system uses the HDFS username defined in the OceanBase Database configuration.
- The export format for Kerberos authentication:
```
hdfs://\$NAMENODE:\$PORT/\$PATH/\$FILENAME?principal=\$PRINCIPAL&keytab=\$KEYTAB&krb5conf=\$KRB5CONF&configs=\$CONFIGS
```
  Description:
  - For Kerberos authentication, you must provide the Kerberos principal (principal), keytab (keytab), and Kerberos configuration file (krb5conf).
  - The username parameter is not applicable in this scenario because Kerberos authentication identifies the user by using the principal.
The parameters are described as follows:
- Common parameters:
  - \$NAMENODE: specifies the IP address of the HDFS NameNode.
  - \$PORT: specifies the port of the HDFS NameNode.
  - \$PATH: specifies the storage path of the file in HDFS.
  - \$FILENAME: specifies the name of the export file.
- Parameters for SIMPLE authentication:
  - ?username=\$USERNAME: optional. specifies the HDFS username. If you do not specify the username parameter, the system uses the HDFS username defined in the OceanBase Database configuration.
- Parameters for Kerberos authentication:
  - \$PRINCIPAL: specifies the Kerberos principal name. Example: principal=hdfs/hadoop@EXAMPLE.COM.
  - \$KEYTAB: specifies the path of the Kerberos keytab file. Example: keytab=/path/to/hdfs.keytab.
  - \$KRB5CONF: specifies the path of the Kerberos configuration file. Example: krb5conf=/path/to/krb5.conf.
  - \$CONFIGS: specifies the HDFS extra configuration parameters. The parameters are in the key-value pair list. Example: configs=dfs.data.transfer.protection=authentication,privacy.
The following examples show the values of file_name when you save the export file to HDFS:
- SIMPLE authentication (basic authentication):
  - Applies to an HDFS cluster without Kerberos authentication.
  - Example 1 (default username):
```
hdfs://10.10.10.1:9000/data/export/sales_data.csv
```
  - Example 2 (specified username):
```
hdfs://10.10.10.1:9000/data/export/sales_data.csv?username=hiveuser
```
- Kerberos authentication:
  - Applies to an HDFS cluster with Kerberos authentication.
  - You can specify the query parameters as needed. Multiple configuration items (such as configs) are supported.
  - Example:
```
hdfs://10.10.10.1:9000/data/export/sales_data.csv?principal=hdfs/hadoop@EXAMPLE.COM&keytab=/etc/security/keytabs/hdfs.keytab&krb5conf=/etc/krb5.conf&configs=dfs.data.transfer.protection=authentication,privacy
```

field_term

[OPTIONALLY] ENCLOSED BY string: specifies the symbol used to wrap field values. By default, no quotation marks are used. For example, ENCLOSED BY '"' indicates that character values are enclosed in double quotation marks. If you use the OPTIONALLY keyword, the specified character is used only to wrap string values.
TERMINATED BY string: specifies the symbol used to separate field values. For example, TERMINATED BY ',' indicates that a comma is used to separate field values.
ESCAPED BY string: specifies the escape character used to handle special characters or parse data in special formats. The default escape character is the backslash (\).

line_term

STARTING BY string: specifies the starting character of each line.
TERMINATED BY string: specifies the ending character of each line. By default, the line break character is used. For example, ... LINES TERMINATED BY '\n' ... indicates that a line is terminated by a line break.

file_option

SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.
- SINGLE [=] TRUE: specifies that data is exported to a single file. This is the default value.
- SINGLE [=] FALSE: specifies that data can be exported to multiple files.
  
  Notice
  
  When the degree of parallelism is greater than 1 and SINGLE = FALSE, data can be exported to multiple files, achieving parallel read and write and improving the export speed.
MAX_FILE_SIZE [=] {int | string}: specifies the size of a single file during export. This option takes effect only when SINGLE = FALSE.
BUFFER_SIZE [=] {int | string}: specifies the size of memory allocated to each thread for each partition during export. If no partitioning is specified, this option is considered for a single partition. The default value is 1 MB.

Note

BUFFER_SIZE is used for export performance tuning. If the machine has sufficient memory and you want to improve the export efficiency, you can set a larger value (such as 4 MB). If the machine has insufficient memory, you can set a smaller value (such as 4 KB). If you set it to 0, all partitions in a single thread use a shared memory.

external_file_format_list

When TYPE = 'CSV', the following fields are included:
- LINE_DELIMITER: specifies the line delimiter of a CSV file. The default value is LINE_DELIMITER='\n'.
- FIELD_DELIMITER: specifies the column delimiter of a CSV file. This is an optional parameter. The default value is FIELD_DELIMITER='\t'.
- ESCAPE: specifies the escape character of a CSV file. This is an optional parameter. The default value is ESCAPE ='\'.
- FIELD_OPTIONALLY_ENCLOSED_BY: specifies the character that encloses field values in a CSV file. This is an optional parameter. The default value is an empty string. This option applies only to specific types of fields (such as CHAR, VARCHAR, TEXT, and JSON).
- FIELD_ENCLOSED_BY: specifies the character that encloses field values in a CSV file. This is an optional parameter. The default value is an empty string, indicating that no character is used to enclose field values. This option applies to all types of fields.
- ENCODING: specifies the character set encoding format of the file. If this parameter is not specified, the default value is UTF8MB4.
- COMPRESSION: specifies the compression format of the file. This is an optional parameter. The supported compression formats are NONE, GZIP, DEFLATE, and ZSTD.
  - GZIP/DEFLATE: specifies GZIP compression.
  - ZSTD: specifies ZSTD compression.
  - NONE: specifies that the file is not compressed (the default value).
- FILE_EXTENSION: specifies the custom file extension. This is an optional parameter. It is used only for multi-file exports and is supported only for CSV files. If this parameter is not specified, the file extension is determined by the format type. The default file extension for CSV files is .csv.
When TYPE = 'PARQUET', the following fields are included:
- COMPRESSION: specifies the compression format of a PARQUET file. The default compression suffix is .parquet. The supported compression formats are UNCOMPRESSED (indicating that the file is not compressed), SNAPPY, GZIP, BROTLI, ZSTD, LZ4, and LZ4_HADOOP. If this parameter is not specified, the default value is UNCOMPRESSED.
- ROW_GROUP_SIZE: specifies the size of a ROW GROUP in a PARQUET file, in bytes. This option can be specified as a number or a string in the format '64MB'. If this parameter is not specified, the default value is 256MB. We recommend that you use the default value.
When TYPE = 'ORC', the following fields are included:
- COMPRESSION: specifies the compression format of an ORC file. The default compression suffix is .orc. The supported compression formats are UNCOMPRESSED (indicating that the file is not compressed), SNAPPY, ZLIB, LZ4, and ZSTD. If this parameter is not specified, the default value is UNCOMPRESSED.
- COMPRESSION_BLOCK_SIZE: specifies the size of blocks into which data is divided during compression, in bytes. This option can be specified as a number or a string in the format '64KB'. If this parameter is not specified, the default value is 256KB. We recommend that you use the default value.
- STRIPE_SIZE: specifies the size of a Stripe in an ORC file, in bytes. This option can be specified as a number or a string in the format '64MB'. If this parameter is not specified, the default value is 64MB. We recommend that you use the default value.
- ROW_INDEX_STRIDE: specifies the frequency of index records. This parameter defines how often an index is recorded. If this parameter is not specified, the default value is 10000. We recommend that you use the default value.

Examples

Export data files to the local server

Set the file path for the export.

To export files, you need to set the system variable secure_file_priv to specify the path where the exported files can be accessed.

Notice

For security reasons, when you set the system variable secure_file_priv, you can only connect to the database through a local socket to execute SQL statements that modify this global variable. For more information, see secure_file_priv.
1. Log in to the OBServer node that connects to OceanBase Database.
```
ssh admin@xxx.xxx.xxx.xxx
```
2. Execute the following command to connect to the oracle001 tenant through a local Unix socket.
```
obclient -S /home/admin/oceanbase/run/sql.sock -usys@oracle001 -p******
```
3. Set the export path to /home/admin/test_data.
```
obclient> SET GLOBAL secure_file_priv = "/home/admin/test_data";
```
4. Log out.
After reconnecting to the database, use the SELECT INTO OUTFILE statement to export data. Specify a comma as the delimiter between two field values; enclose string values in double quotation marks ("); and use a newline character as the end marker.
- To write a single file sequentially, specify the file name as test_tbl1.csv.
```
obclient> SELECT /*+parallel(2)*/ *
INTO OUTFILE '/home/admin/test_data/test_tbl1.csv'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
FROM test_tbl1;
```
  The return result is as follows:
```
Query OK, 9 rows affected
```
- To write multiple files in parallel, do not specify the file name, and ensure that each file does not exceed 4 MB in size.
```
obclient> SELECT /*+parallel(2)*/ *
  INTO OUTFILE '/home/admin/test_data/'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  SINGLE = FALSE MAX_FILE_SIZE = '4MB'
FROM test_tbl1;
```
  The return result is as follows:
```
Query OK, 9 rows affected
```
- To write multiple files in parallel, specify the file name prefix as dd2024, and ensure that each file does not exceed 4 MB in size.
```
obclient> SELECT /*+parallel(2)*/ *
  INTO OUTFILE '/home/admin/test_data/dd2024'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  SINGLE = FALSE MAX_FILE_SIZE = '4MB'
FROM test_tbl1;
```
  The return result is as follows:
```
Query OK, 9 rows affected
```
Note
- When multiple export tasks are performed to the same path at the same time, errors or incomplete data export may occur. You can avoid this by setting the export path appropriately. For example, when the following two statements are executed simultaneously: SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1; and SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t2;, an error may occur because the export file names are the same. In this case, we recommend that you set the export paths to test/data1 and test/data2, respectively.
- When SINGLE = FALSE and the export fails due to reasons such as file already exists, you can delete all files with the same prefix as the export target in the export directory, or delete the export directory and recreate it, and then retry the export operation. For example, after the following statement fails: SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;, you can delete all files with the data prefix in the test directory, or directly delete the test directory and recreate it, and then retry the export operation.
Log in to the server and view the exported file information in the /home/admin/test_data directory of the OBServer node.

[xxx@xxx /home/admin/test_data]# ls

The return result is as follows:

data_0_0_0  data_0_1_0  dd2024_0_0_0  dd2024_0_1_0  test_tbl1.csv

In the result, test_tbl1.csv is the file name of the data exported in the sequential single-file export example; data_0_0_0 and data_0_1_0 are the file names of the data exported in the parallel multi-file export example without specifying the file name; and dd2024_0_0_0 and dd2024_0_1_0 are the file names of the data exported in the parallel multi-file export example with the file name prefix specified as dd2024.

Export data files to OSS

Use the SELECT INTO OUTFILE statement to export data from the test_tbl2 table to the specified OSS storage location by partition. The partitioning is based on the combination of the col1 and col2 columns. Rows with the same values in these columns belong to the same partition and are exported to the same directory.

obclient> SELECT /*+parallel(3)*/ *
  INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY'
    PARTITION BY CONCAT(col1,'/',col2)
    SINGLE = FALSE BUFFER_SIZE = '2MB'
FROM test_tbl2;

The storage location is specified by the $DATA_FOLDER_NAME variable. You also need to provide the OSS host address, Access ID, and Access Key.

Export data files to HDFS

Notice

When you use the HDFS external table write feature, OceanBase Database must be deployed in an environment that supports the Java SDK. For more information about how to deploy an environment that supports the Java SDK, see Deploy an environment that supports the Java SDK.

Export data using SIMPLE authentication

Export data from the lineitem_external table in OceanBase Database to HDFS and save it in CSV file format.

SELECT *
INTO OUTFILE 'hdfs://${namenode}:${port}/path/to/lineitem.csv'
FORMAT = (
    TYPE = 'CSV'
    LINE_DELIMITER = '\n'
    FIELD_DELIMITER = ','
    FIELD_OPTIONALLY_ENCLOSED_BY ='"'
    )
SINGLE = TRUE
FROM lineitem_external;

Export data using KERBEROS authentication

Unlike SIMPLE authentication, KERBEROS authentication requires additional configuration files and key files. Export data from the lineitem_external table in OceanBase Database to HDFS and save it in CSV file format.

SELECT *
INTO OUTFILE 'hdfs://${namenode}:${port}/path/to/lineitem.csv?principal=hdfs/hadoop@EXAMPLE.COM&keytab=/path/to/hdfs.keytab&krb5conf=/path/to/krb5.conf&configs=dfs.data.transfer.protection=authentication,privacy'
FORMAT = (
    TYPE = 'CSV'
    LINE_DELIMITER = '\n'
    FIELD_DELIMITER = ','
    FIELD_OPTIONALLY_ENCLOSED_BY ='"'
    )
SINGLE = TRUE
FROM lineitem_external;

References

Use the OUTFILE statement to export data

Purpose

This statement is used to store the query result in a variable or file. Specifically:

SELECT ... INTO OUTFILE is used to store the result set in an external file and specify the output format.
SELECT ... INTO DUMPFILE is used to write an unformatted single line to an external file.
SELECT ... INTO var_list is used to store the result set in a variable.

Privilege requirements

Syntax

SELECT [/*+parallel(N)*/] column_list_option
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
     | OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
     | DUMPFILE 'file_name'
     | into_var_list}
FROM table_name_list
[WHERE where_conditions]
[GROUP BY group_by_list [HAVING having_search_conditions]]
[ORDER BY order_expression_list];

column_list_option:
    column_name [, column_name ...]

field_opt:
    {COLUMNS | FIELDS} field_term_list

field_term_list:
  field_term [, field_term ...]

field_term:
    {[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string

line_opt:
    LINES line_term_list

line_term_list:
    line_term [line_term]

line_term:
    {STARTING | TERMINATED} BY string

file_opt:
    file_option [, file_option ...]

file_option:
    SINGLE [=] {TRUE | FALSE}
    | MAX_FILE_SIZE [=] {int | string}
    | BUFFER_SIZE [=] {int | string}

external_file_format_list:
    type_csv_option
    | type_parquet_option
    | type_orc_option

type_csv_option:
    TYPE = 'CSV'
    LINE_DELIMITER = '<string>' | <expr>
    FIELD_DELIMITER = '<string>' | <expr>
    ESCAPE = '<character>' | <expr>
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
    FIELD_ENCLOSED_BY = '<character>' | <expr>
    ENCODING = 'charset'
    COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
    FILE_EXTENSION = ['<string>']

type_parquet_option:
    TYPE = 'PARQUET',
    COMPRESSION = '<string>'
    ROW_GROUP_SIZE = '<string>' | <int>

type_orc_option:
    TYPE = 'ORC'
    COMPRESSION = '<string>'
    COMPRESSION_BLOCK_SIZE = '<string>' | <int>
    STRIPE_SIZE = '<string>' | <int>
    ROW_INDEX_STRIDE = <int>

Parameters

Parameter	Description
parallel(N)	Optional. Specifies the degree of parallelism for the statement.
column_list_option	Specifies the columns to export. To export all data, use `*`. `column_name`: the name of the column. For more information about column options in a query statement, see SIMPLE SELECT.
file_name	Specifies the path and name of the export file. For more information, see file_name below.
PARTITION BY part_expr	Optional. Specifies the partitioning method for the exported data. The value of `part_expr` is used as part of the export path. For each row, the value of `part_expr` is calculated, and rows with the same value of `part_expr` are exported to the same directory. Notice When exporting data by partition, `SINGLE = FALSE` must be specified, allowing multiple files to be exported. Currently, exporting data by partition is only supported for importing to OSS.
FORMAT = (external_file_format_list)	Optional. The FORMAT clause specifies the export file format. Use TYPE to specify the CSV, PARQUET, or ORC file format. TYPE cannot be empty. For more information, see external_file_format_list below.
CHARSET \| CHARACTER SET charset_name	Optional. Specifies the character set for the exported file. `charset_name` specifies the name of the character set.
field_opt	Optional. Specifies the field format options for the export. Use `FIELDS` or `COLUMNS` to specify the format of each field in the output file. For more information, see field_term below.
line_opt	Optional. Specifies the start and end delimiters for data rows in the export. Use the `LINES` clause to set the start and end characters for each row. For more information, see line_term below.
file_opt	Optional. Specifies whether to export to multiple files and the size of each file when exporting to multiple files. For more information, see file_option below.
FROM table_name_list	Specifies the data source.
WHERE where_conditions	Optional. Specifies the filter conditions. Only data that meets the conditions is included in the query results. For more information about filter conditions in a query statement, see SIMPLE SELECT.
GROUP BY group_by_list	Optional. Specifies the fields for grouping. Typically used with aggregate functions. Note All columns in the `SELECT` clause that are not used with aggregate functions must appear in the `GROUP BY` clause.
HAVING having_search_conditions	Optional. Filters the grouped data. The `HAVING` clause is similar to the `WHERE` clause, but it can use aggregate functions such as `SUM` and `AVG`.
ORDER BY order_expression_list	Optional. Specifies the columns to sort the result set by `ASC` or `DESC`. If neither `ASC` nor `DESC` is specified, the default is `ASC`. `ASC`: ascending order. `DESC`: descending order.

file_name

file_name has the following format:

When you save the export file to an OBServer node, the format is: /\$PATH/\$FILENAME, where the parameters are described as follows:
- \$PATH: specifies the path to save the export file, that is, the path of the export file on the OBServer node.
- \$FILENAME: specifies the name of the export file. When SINGLE = FALSE, it specifies the prefix of the export file. If you do not specify this parameter, the system generates a default prefix data. The system automatically generates the suffix.
When you save the export file to Alibaba Cloud Object Storage Service (OSS), the format is: oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY, where the parameters are described as follows:
- \$PATH: specifies the path to save the export file, that is, the path of the file in the bucket.
- \$FILENAME: specifies the name of the export file. When SINGLE = FALSE, it specifies the prefix of the export file. If you do not specify this parameter, the system generates a default prefix data. The system automatically generates the suffix.
- \$HOST: specifies the host name or the domain name of the CDN-accelerated service for accessing the OSS service.
- \$ACCESS_ID: specifies the Access Key ID required to access the OSS service for identity verification.
- \$ACCESSKEY: specifies the Access Key Secret required to access the OSS service for identity verification.
Note
- OceanBase Database supports exporting data to S3 and object storage services based on the S3 protocol.
- Alibaba Cloud OSS has a file size limit. When you export a file that exceeds 5 GB to OSS, the system splits the file into multiple files, each of which is less than 5 GB.
When you save the export file to Hadoop Distributed File System (HDFS), the format is:

Notice

When you use the write capability of HDFS external tables, OceanBase Database must be deployed in an environment that supports the Java SDK. For more information about how to deploy the Java SDK environment, see Deploy the OceanBase Database Java SDK environment.
- The export format for SIMPLE authentication:
  - Basic authentication (without Kerberos):
```
hdfs://\$NAMENODE:\$PORT/\$PATH/\$FILENAME[?username=\$USERNAME]
```
  - Description:
    - When you use basic authentication (SIMPLE), you can specify the HDFS username by using the username parameter.
    - If you do not specify the username parameter, the system uses the HDFS username defined in the OceanBase Database configuration.
- The export format for Kerberos authentication:
```
hdfs://\$NAMENODE:\$PORT/\$PATH/\$FILENAME?principal=\$PRINCIPAL&keytab=\$KEYTAB&krb5conf=\$KRB5CONF&configs=\$CONFIGS
```
  Description:
  - For Kerberos authentication, you must provide the Kerberos principal (principal), keytab (keytab), and Kerberos configuration file (krb5conf).
  - The username parameter is not applicable in this scenario because Kerberos authentication identifies the user by using the principal.
The parameters are described as follows:
- Common parameters:
  - \$NAMENODE: specifies the IP address of the HDFS NameNode.
  - \$PORT: specifies the port of the HDFS NameNode.
  - \$PATH: specifies the storage path of the file in HDFS.
  - \$FILENAME: specifies the name of the export file.
- Parameters for SIMPLE authentication:
  - ?username=\$USERNAME: optional. specifies the HDFS username. If you do not specify the username parameter, the system uses the HDFS username defined in the OceanBase Database configuration.
- Parameters for Kerberos authentication:
  - \$PRINCIPAL: specifies the Kerberos principal name. Example: principal=hdfs/hadoop@EXAMPLE.COM.
  - \$KEYTAB: specifies the path of the Kerberos keytab file. Example: keytab=/path/to/hdfs.keytab.
  - \$KRB5CONF: specifies the path of the Kerberos configuration file. Example: krb5conf=/path/to/krb5.conf.
  - \$CONFIGS: specifies the HDFS extra configuration parameters. The parameters are in the key-value pair list. Example: configs=dfs.data.transfer.protection=authentication,privacy.
The following examples show the values of file_name when you save the export file to HDFS:
- SIMPLE authentication (basic authentication):
  - Applies to an HDFS cluster without Kerberos authentication.
  - Example 1 (default username):
```
hdfs://10.10.10.1:9000/data/export/sales_data.csv
```
  - Example 2 (specified username):
```
hdfs://10.10.10.1:9000/data/export/sales_data.csv?username=hiveuser
```
- Kerberos authentication:
  - Applies to an HDFS cluster with Kerberos authentication.
  - You can specify the query parameters as needed. Multiple configuration items (such as configs) are supported.
  - Example:
```
hdfs://10.10.10.1:9000/data/export/sales_data.csv?principal=hdfs/hadoop@EXAMPLE.COM&keytab=/etc/security/keytabs/hdfs.keytab&krb5conf=/etc/krb5.conf&configs=dfs.data.transfer.protection=authentication,privacy
```

field_term

[OPTIONALLY] ENCLOSED BY string: specifies the symbol used to wrap field values. By default, no quotation marks are used. For example, ENCLOSED BY '"' indicates that character values are enclosed in double quotation marks. If you use the OPTIONALLY keyword, the specified character is used only to wrap string values.
TERMINATED BY string: specifies the symbol used to separate field values. For example, TERMINATED BY ',' indicates that a comma is used to separate field values.
ESCAPED BY string: specifies the escape character used to handle special characters or parse data in special formats. The default escape character is the backslash (\).

line_term

STARTING BY string: specifies the starting character of each line.
TERMINATED BY string: specifies the ending character of each line. By default, the line break character is used. For example, ... LINES TERMINATED BY '\n' ... indicates that a line is terminated by a line break.

file_option

SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.
- SINGLE [=] TRUE: specifies that data is exported to a single file. This is the default value.
- SINGLE [=] FALSE: specifies that data can be exported to multiple files.
  
  Notice
  
  When the degree of parallelism is greater than 1 and SINGLE = FALSE, data can be exported to multiple files, achieving parallel read and write and improving the export speed.
MAX_FILE_SIZE [=] {int | string}: specifies the size of a single file during export. This option takes effect only when SINGLE = FALSE.
BUFFER_SIZE [=] {int | string}: specifies the size of memory allocated to each thread for each partition during export. If no partitioning is specified, this option is considered for a single partition. The default value is 1 MB.

Note

BUFFER_SIZE is used for export performance tuning. If the machine has sufficient memory and you want to improve the export efficiency, you can set a larger value (such as 4 MB). If the machine has insufficient memory, you can set a smaller value (such as 4 KB). If you set it to 0, all partitions in a single thread use a shared memory.

external_file_format_list

When TYPE = 'CSV', the following fields are included:
- LINE_DELIMITER: specifies the line delimiter of a CSV file. The default value is LINE_DELIMITER='\n'.
- FIELD_DELIMITER: specifies the column delimiter of a CSV file. This is an optional parameter. The default value is FIELD_DELIMITER='\t'.
- ESCAPE: specifies the escape character of a CSV file. This is an optional parameter. The default value is ESCAPE ='\'.
- FIELD_OPTIONALLY_ENCLOSED_BY: specifies the character that encloses field values in a CSV file. This is an optional parameter. The default value is an empty string. This option applies only to specific types of fields (such as CHAR, VARCHAR, TEXT, and JSON).
- FIELD_ENCLOSED_BY: specifies the character that encloses field values in a CSV file. This is an optional parameter. The default value is an empty string, indicating that no character is used to enclose field values. This option applies to all types of fields.
- ENCODING: specifies the character set encoding format of the file. If this parameter is not specified, the default value is UTF8MB4.
- COMPRESSION: specifies the compression format of the file. This is an optional parameter. The supported compression formats are NONE, GZIP, DEFLATE, and ZSTD.
  - GZIP/DEFLATE: specifies GZIP compression.
  - ZSTD: specifies ZSTD compression.
  - NONE: specifies that the file is not compressed (the default value).
- FILE_EXTENSION: specifies the custom file extension. This is an optional parameter. It is used only for multi-file exports and is supported only for CSV files. If this parameter is not specified, the file extension is determined by the format type. The default file extension for CSV files is .csv.
When TYPE = 'PARQUET', the following fields are included:
- COMPRESSION: specifies the compression format of a PARQUET file. The default compression suffix is .parquet. The supported compression formats are UNCOMPRESSED (indicating that the file is not compressed), SNAPPY, GZIP, BROTLI, ZSTD, LZ4, and LZ4_HADOOP. If this parameter is not specified, the default value is UNCOMPRESSED.
- ROW_GROUP_SIZE: specifies the size of a ROW GROUP in a PARQUET file, in bytes. This option can be specified as a number or a string in the format '64MB'. If this parameter is not specified, the default value is 256MB. We recommend that you use the default value.
When TYPE = 'ORC', the following fields are included:
- COMPRESSION: specifies the compression format of an ORC file. The default compression suffix is .orc. The supported compression formats are UNCOMPRESSED (indicating that the file is not compressed), SNAPPY, ZLIB, LZ4, and ZSTD. If this parameter is not specified, the default value is UNCOMPRESSED.
- COMPRESSION_BLOCK_SIZE: specifies the size of blocks into which data is divided during compression, in bytes. This option can be specified as a number or a string in the format '64KB'. If this parameter is not specified, the default value is 256KB. We recommend that you use the default value.
- STRIPE_SIZE: specifies the size of a Stripe in an ORC file, in bytes. This option can be specified as a number or a string in the format '64MB'. If this parameter is not specified, the default value is 64MB. We recommend that you use the default value.
- ROW_INDEX_STRIDE: specifies the frequency of index records. This parameter defines how often an index is recorded. If this parameter is not specified, the default value is 10000. We recommend that you use the default value.

Examples

Export data files to the local server

Set the file path for the export.

To export files, you need to set the system variable secure_file_priv to specify the path where the exported files can be accessed.

Notice

For security reasons, when you set the system variable secure_file_priv, you can only connect to the database through a local socket to execute SQL statements that modify this global variable. For more information, see secure_file_priv.
1. Log in to the OBServer node that connects to OceanBase Database.
```
ssh admin@xxx.xxx.xxx.xxx
```
2. Execute the following command to connect to the oracle001 tenant through a local Unix socket.
```
obclient -S /home/admin/oceanbase/run/sql.sock -usys@oracle001 -p******
```
3. Set the export path to /home/admin/test_data.
```
obclient> SET GLOBAL secure_file_priv = "/home/admin/test_data";
```
4. Log out.
After reconnecting to the database, use the SELECT INTO OUTFILE statement to export data. Specify a comma as the delimiter between two field values; enclose string values in double quotation marks ("); and use a newline character as the end marker.
- To write a single file sequentially, specify the file name as test_tbl1.csv.
```
obclient> SELECT /*+parallel(2)*/ *
INTO OUTFILE '/home/admin/test_data/test_tbl1.csv'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
FROM test_tbl1;
```
  The return result is as follows:
```
Query OK, 9 rows affected
```
- To write multiple files in parallel, do not specify the file name, and ensure that each file does not exceed 4 MB in size.
```
obclient> SELECT /*+parallel(2)*/ *
  INTO OUTFILE '/home/admin/test_data/'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  SINGLE = FALSE MAX_FILE_SIZE = '4MB'
FROM test_tbl1;
```
  The return result is as follows:
```
Query OK, 9 rows affected
```
- To write multiple files in parallel, specify the file name prefix as dd2024, and ensure that each file does not exceed 4 MB in size.
```
obclient> SELECT /*+parallel(2)*/ *
  INTO OUTFILE '/home/admin/test_data/dd2024'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  SINGLE = FALSE MAX_FILE_SIZE = '4MB'
FROM test_tbl1;
```
  The return result is as follows:
```
Query OK, 9 rows affected
```
Note
- When multiple export tasks are performed to the same path at the same time, errors or incomplete data export may occur. You can avoid this by setting the export path appropriately. For example, when the following two statements are executed simultaneously: SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1; and SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t2;, an error may occur because the export file names are the same. In this case, we recommend that you set the export paths to test/data1 and test/data2, respectively.
- When SINGLE = FALSE and the export fails due to reasons such as file already exists, you can delete all files with the same prefix as the export target in the export directory, or delete the export directory and recreate it, and then retry the export operation. For example, after the following statement fails: SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;, you can delete all files with the data prefix in the test directory, or directly delete the test directory and recreate it, and then retry the export operation.
Log in to the server and view the exported file information in the /home/admin/test_data directory of the OBServer node.

[xxx@xxx /home/admin/test_data]# ls

The return result is as follows:

data_0_0_0  data_0_1_0  dd2024_0_0_0  dd2024_0_1_0  test_tbl1.csv

Export data files to OSS

obclient> SELECT /*+parallel(3)*/ *
  INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY'
    PARTITION BY CONCAT(col1,'/',col2)
    SINGLE = FALSE BUFFER_SIZE = '2MB'
FROM test_tbl2;

The storage location is specified by the $DATA_FOLDER_NAME variable. You also need to provide the OSS host address, Access ID, and Access Key.

Export data files to HDFS

Notice

Export data using SIMPLE authentication

Export data from the lineitem_external table in OceanBase Database to HDFS and save it in CSV file format.

SELECT *
INTO OUTFILE 'hdfs://${namenode}:${port}/path/to/lineitem.csv'
FORMAT = (
    TYPE = 'CSV'
    LINE_DELIMITER = '\n'
    FIELD_DELIMITER = ','
    FIELD_OPTIONALLY_ENCLOSED_BY ='"'
    )
SINGLE = TRUE
FROM lineitem_external;

Export data using KERBEROS authentication

SELECT *
INTO OUTFILE 'hdfs://${namenode}:${port}/path/to/lineitem.csv?principal=hdfs/hadoop@EXAMPLE.COM&keytab=/path/to/hdfs.keytab&krb5conf=/path/to/krb5.conf&configs=dfs.data.transfer.protection=authentication,privacy'
FORMAT = (
    TYPE = 'CSV'
    LINE_DELIMITER = '\n'
    FIELD_DELIMITER = ','
    FIELD_OPTIONALLY_ENCLOSED_BY ='"'
    )
SINGLE = TRUE
FROM lineitem_external;

References

Use the OUTFILE statement to export data

OceanBase

Customer Stories

Documentation

SELECT INTO

Purpose

Privilege requirements

Syntax

Parameters

Notice

Note

file_name

Note

Notice

field_term

line_term

file_option

Notice

Note

external_file_format_list

Examples

Export data files to the local server

Notice

Note

Export data files to OSS

Export data files to HDFS

Notice

Export data using SIMPLE authentication

Export data using KERBEROS authentication

References

SELECT INTO

Purpose

Privilege requirements

Syntax

Parameters

Notice

Note

file_name

Note

Notice

field_term

line_term

file_option

Notice

Note

external_file_format_list

Examples

Export data files to the local server

Notice

Note

Export data files to OSS

Export data files to HDFS

Notice

Export data using SIMPLE authentication

Export data using KERBEROS authentication

References