The SELECT INTO OUTFILE statement is a commonly used method for exporting data. It allows you to specify which fields to export, making it useful for scenarios where you don't need to export the primary key fields. When combined with the LOAD DATA INFILE statement for importing, it provides a convenient way to handle data import and export.
Background information
OceanBase Database supports this syntax in the following modes.
| Mode | Recommended OceanBase Database version | Recommended client |
|---|---|---|
| MySQL mode | V2.2.40 and later | MySQL Client or OBClient |
| Oracle mode | V2.2.40 and later | OBClient |
Notice
To import or export data, you must directly connect to an OceanBase Database instance by using a client.
Privilege requirements
To execute the
SELECT INTOstatement in a MySQL tenant, you must have theFILEprivilege and theSELECTprivilege on the corresponding table. To grant theFILEprivilege to a user, you can use the following command:GRANT FILE ON *.* TO user_name;Here,
user_nameis the user who needs to execute theSELECT INTOstatement. For more information about privileges in OceanBase Database, see Privilege types in MySQL mode.To execute the
SELECT INTOstatement in an Oracle tenant, you must have theSELECTprivilege on the corresponding table. For more information about privileges in OceanBase Database, see Privilege types in Oracle mode.
Syntax
SELECT [/*+parallel(N)*/] column_list_option
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
| OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
| DUMPFILE 'file_name'
| into_var_list}
FROM table_name_list
[WHERE where_conditions]
[GROUP BY group_by_list [HAVING having_search_conditions]]
[ORDER BY order_expression_list];
column_list_option:
column_name [, column_name ...]
field_opt:
{COLUMNS | FIELDS} field_term_list
field_term_list:
field_term [, field_term ...]
field_term:
{[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string
line_opt:
LINES line_term_list
line_term_list:
line_term [, line_term ...]
line_term:
{STARTING | TERMINATED} BY string
file_opt:
file_option [, file_option ...]
file_option:
SINGLE [=] {TRUE | FALSE}
| MAX_FILE_SIZE [=] {int | string}
| BUFFER_SIZE [=] {int | string}
external_file_format_list:
type_csv_option
| type_parquet_option
| type_orc_option
type_csv_option:
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
FIELD_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
FILE_EXTENSION = ['<string>']
type_parquet_option:
TYPE = 'PARQUET',
COMPRESSION = '<string>'
ROW_GROUP_SIZE = '<string>' | <int>
type_orc_option:
TYPE = 'ORC'
COMPRESSION = '<string>'
COMPRESSION_BLOCK_SIZE = '<string>' | <int>
STRIPE_SIZE = '<string>' | <int>
ROW_INDEX_STRIDE = <int>
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | Optional. Specifies the degree of parallelism for executing the statement. |
| column_list_option | Specifies the column options for export. To export all data, you can use *.column_name: the name of the column. For more information about column options in a SELECT statement, see SIMPLE SELECT. |
| file_name | Specifies the path and name of the export file. For more information, see file_name. |
| PARTITION BY part_expr |
NoteFor OceanBase Database V4.3.2, this option is supported starting from V4.3.2 BP1. It allows you to control the partitioning method for exporting data. part_expr is used as part of the export path. For each row of data, the value of part_expr is calculated. Rows with the same value of part_expr are grouped into the same partition and exported to the same directory.
Notice
|
| FORMAT = (external_file_format_list) | Optional. Specifies the attributes related to the file format for the FORMAT clause. Use TYPE to specify the file format for exporting data to CSV, PARQUET, or ORC. TYPE cannot be empty. For more information, see external_file_format_list. |
| CHARSET | CHARACTER SET charset_name | Optional. Specifies the character set for exporting data to an external file. charset_name specifies the name of the character set. |
| field_opt | Optional. Specifies the format options for exported fields. Use FIELDS or COLUMNS to specify the format of each field in the output file. For more information, see field_term. |
| line_opt | Optional. Specifies the start and end delimiters for data rows in the output file. Use the LINES clause to set the start and end characters for each row. For more information, see line_term. |
| file_opt | Optional. Specifies whether to export data to multiple files and the size of each file when exporting to multiple files. For more information, see file_option. |
| FROM table_name_list | Specifies the data source. |
| WHERE where_conditions | Optional. Specifies the filter conditions. Only data that meets the conditions is included in the query results. For more information about filter conditions in a SELECT statement, see SIMPLE SELECT. |
| GROUP BY group_by_list | Optional. Specifies the fields to group by. It is typically used with aggregate functions.
NoteAll columns in the |
| HAVING having_search_conditions | Optional. Filters the grouped data. The HAVING clause is similar to the WHERE clause, but it allows the use of aggregate functions such as SUM and AVG. |
| ORDER BY order_expression_list | Optional. Specifies the columns to sort the result set in ascending (ASC) or descending (DESC) order. If neither ASC nor DESC is specified, the default is ASC.
|
file_name
file_name has the following format:
When you save the exported file to an OBServer node, the format is:
/\$PATH/\$FILENAME, where the parameters are described as follows:\$PATH: specifies the path to save the exported file on the OBServer node.\$FILENAME: specifies the name of the exported file. WhenSINGLE = FALSE, it indicates the prefix of the exported file. If not specified, the default prefixdatais used, and the system generates the suffix automatically.
When you save the exported file to an OSS bucket, the format is:
oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY, where the parameters are described as follows:\$PATH: specifies the path to save the exported file in the bucket.\$FILENAME: specifies the name of the exported file. WhenSINGLE = FALSE, it indicates the prefix of the exported file. If not specified, the default prefixdatais used, and the system generates the suffix automatically.\$HOST: the address of the OSS endpoint.\$ACCESS_ID: specifies the Access Key ID required to access the OSS service for authentication.\$ACCESSKEY: specifies the Access Key Secret required to access the OSS service for authentication.Note
- For OceanBase Database V4.3.5, the
SELECT INTOstatement is supported for exporting data to S3 and object storage services that support the S3 protocol, starting from V4.3.5 BP2. - Aliyun OSS has a file size limit. Files larger than 5 GB are split into multiple files, each smaller than 5 GB, when exported to OSS.
- For OceanBase Database V4.3.5, the
field_term
[OPTIONALLY] ENCLOSED BY string: specifies the character used to enclose field values. By default, no quotation marks are used. For example,ENCLOSED BY '"'indicates that character values are enclosed in double quotation marks. If theOPTIONALLYkeyword is used, only string values are enclosed in the specified character.TERMINATED BY string: specifies the character used to separate field values. For example,TERMINATED BY ','indicates that a comma is used to separate field values.ESCAPED BY string: specifies the escape character for handling special characters or parsing data in special formats. The default escape character is the backslash (\).
line_term
TERMINATED BY string: specifies the ending character for each line. By default, the newline character is used. For example,... LINES TERMINATED BY '\n' ...indicates that a newline character is used as the ending marker for a line.
file_option
SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.SINGLE [=] TRUE: the default value, indicating that data can only be exported to a single file.SINGLE [=] FALSE: indicating that data can be exported to multiple files.Notice
When the parallelism is greater than 1 and
SINGLE = FALSE, data can be exported to multiple files, achieving parallel read and write and improving the export speed.
MAX_FILE_SIZE [=] {int | string}: specifies the size of a single file during export. This option is effective only whenSINGLE = FALSE.BUFFER_SIZE [=] {int | string}: specifies the size of memory allocated to each thread for each partition during export (treated as a single partition if no partitioning is specified). The default value is 1 MB.Note
BUFFER_SIZEis used for export performance tuning. If the machine has sufficient memory and you want to improve the export efficiency, you can set a larger value (e.g., 4 MB). If the machine has insufficient memory, you can set a smaller value (e.g., 4 KB). If set to 0, all partitions in a single thread use a shared memory block.- For OceanBase Database V4.3.2, the
BUFFER_SIZEparameter is supported starting from V4.3.2 BP1.
external_file_format_list
TYPE = 'CSV' contains the following fields:
LINE_DELIMITER: specifies the row delimiter for the CSV file. The default isLINE_DELIMITER='\n'for a MySQL tenant andLINE_DELIMITER=chr(10)for an Oracle tenant.FIELD_DELIMITER: Optional. The column delimiter for the CSV file. The default isFIELD_DELIMITER='\t'.ESCAPEspecifies the escape symbol in the CSV file, and it must be 1 byte. The default value isESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: Optional. The symbol that wraps the values of fields in a CSV file. The default value is an empty string. Use this option to wrap only the values of fields of some types (such as CHAR, VARCHAR, TEXT, and JSON).FIELD_ENCLOSED_BY: optional. The character that encloses field values in the CSV file. The default value is empty, indicating that no enclosure is used. This option applies the enclosure character to all types of fields.ENCODINGspecifies the character set encoding of the file. If you do not specify this option, the default value isUTF8MB4.COMPRESSION: Optional. Specify the file compression format. Supported values: NONE, GZIP, DEFLATE, and ZSTD.GZIP/DEFLATE: GZIP-compresses files.ZSTD- Compressed ZSTD files.NONEindicates the file is not compressed (default).
FILE_EXTENSION: Optional. Specifies a user-defined file extension, which is used only during multi-file export and is supported only for the CSV format. If this parameter is not specified, the file extension is determined based on the format type. The default file extension for the CSV format is.csv.
TYPE = 'PARQUET' contains the following fields:
COMPRESSION: specifies the compression format of the Parquet file. The default value isUNCOMPRESSED, and the compressed file extension is.parquet. Valid values areUNCOMPRESSED,SNAPPY,GZIP,BROTLI,ZSTD,LZ4, andLZ4_HADOOP.ROW_GROUP_SIZE: Specifies the size of the ROW GROUP in the PARQUET file in bytes. You can specify the size by writing a numeric value or by writing a string such as'64MB'. The default is256MBif you do not specify a value. We recommend that you use the default value.
TYPE = 'ORC' contains the following fields:
COMPRESSION: specifies the compression format of the ORC file, with the default compression suffix being.orc. The supported compression formats areUNCOMPRESSED(indicating no compression),SNAPPY,ZLIB,LZ4, andZSTD. If not specified, the default value isUNCOMPRESSED.COMPRESSION_BLOCK_SIZE: specifies the size of the data blocks that are compressed, in bytes. You can specify a numeric value or a string in the format '64KB'. The default value is256KB. We recommend that you retain the default value.STRIPE_SIZE: Specify the stripe size of the ORC file, in bytes. This option can be specified as a number or a string such as'64MB'. The default value is64MBif this option is not specified. We recommend that you use the default value.ROW_INDEX_STRIDE: A parameter that controls the frequency of index records, specifying how often to record an index. If not specified, it defaults to10000. We recommend using the default value.
Considerations
When multiple export tasks are exporting to the same path simultaneously, errors or partial data exports may occur. You can avoid this by setting the export path appropriately. For example:
SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t2;When multiple export tasks are executed simultaneously, errors may occur due to the same export file name. We recommend that you set the export paths to
test/data1andtest/data2.When
SINGLE = FALSE, if an export fails due to reasons such asfile already exist, you can delete all files with the same prefix as the export target in the export directory, or delete the export directory and recreate it, and then retry the export. For example:SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;After a failure, you can delete all files with the
dataprefix in thetestdirectory, or directly delete thetestdirectory and recreate it, and then retry the export.The compression of files in PARQUET and ORC formats is performed at the data block level within the file, not at the file level. Therefore, the compression suffix is only applicable to CSV files and not to PARQUET or ORC files.
When exporting compressed CSV files, the naming conventions for single-file and multi-file exports differ slightly:
For multi-file exports (
SINGLE = FALSE), if a compression algorithm is specified, the corresponding compression suffix is appended to the file name. For example, if the export file name isdata_1_0_1and the compression algorithm is specified asgzip, the file name becomesdata_1_0_1.gz.For single-file exports (
SINGLE = TRUE), the output file name is identical to the user-specified file name. If a compression suffix is specified, it is included in the output file name. If no compression suffix is specified, it is omitted.
The logic for the
FILE_EXTENSIONfield when exporting CSV files is as follows:If both
COMPRESSIONandFILE_EXTENSIONare specified, the suffix order is.file_extension.compression. For example, if the export file name isdata_0_0_1, theFILE_EXTENSIONis specified asxls, and theCOMPRESSIONis specified asgzip, the file name for multi-file export becomesdata_0_0_1.xls.gz.Regardless of whether the
FILE_EXTENSIONis specified with a leading.or not, the output file extension will contain only one.. For example, if the export file name isdata, and theFILE_EXTENSIONis specified asxlsor.xls, the file name will bedata.xls.
Example
Export data file to a local drive
Example 1: Use the SELECT INTO OUTFILE statement to export a CSV file
Log in to the server where the OBServer node to be connected is located.
Log in to the host on which the OBServer node is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the import file path.
Set the system variable
secure_file_privto configure the path accessible during import or export operations.Notice
To prevent database access issues, a local Unix socket is required to update the
secure_file_privsystem variable. For more information, see secure_file_priv.Log in to the server where the OBServer node is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxExecute the following command to connect to the
mysql001tenant by using the local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin/test_data.obclient [test]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
Use the
SELECT INTO OUTFILEstatement to export data after reconnecting to the database.You can write data to a single file in serial mode. For example, write data to a file named
test_tbl1.csv. Specify the character set asutf8mb4and the compression format asgzip.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) FROM tbl1;The returned result is as follows:
Query OK, 9 rows affectedConcurrently writes multiple files without specifying the filename. The size of each file is less than 4MB. The character encoding format is
utf8mb4and the compression format isgzip.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The response is as follows:
Query OK, 9 rows affectedWrite to multiple files in parallel. Set the prefix of the file name to
dd2024, set the maximum size of each file to 4 MB, the character set toutf8mb4, and the compression format togzip.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The result is as follows:
Query OK, 9 rows affected
Log in to the server, and go to the
/home/admin/test_datadirectory on the OBServer node. View the file information exported in this directory.[xxx@xxx /home/admin/test_data]# lsThe result returned is as follows:
data_0.csv.gz dd2024_0.csv.gz test_tbl1.csvIn this example,
test_tbl1.csvis the file name for the sequential write of a single file;data_0.csv.gzis the file name for the parallel write of multiple files without specifying file names, with the compression algorithm specified asgzip; anddd2024_0.csv.gzis the file name for the parallel write of multiple files with the file name prefix specified asdd2024and the compression algorithm specified asgzip.
Log in to the server where you want to connect the OBServer node.
Go to the host where the observer node resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the import file path.
Set the system variable
secure_file_privto configure the path that can be accessed during import or export.Note
To ensure security, when you set the system variable
secure_file_priv, you must connect to the database using a local Unix socket. For more information, see secure_file_priv.Log in to the server where the OBServer node is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxExecute the following command to connect to the
oracle001tenant by using a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@oracle001 -p******Set the import path to
/home/admin/test_data.obclient [SYS]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, run the
SELECT INTO OUTFILEstatement to export the data.Single file serial writing. Specify the file name as
test_tbl1.csv, character encoding asutf8mb4, and compression format asgzip.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) FROM tbl1;The result is as follows:
Query OK, 9 rows affectedWrite multiple files in parallel, specifying no file names, with each file size not exceeding 4 MB, using the
utf8mb4character encoding format, and compressing with thegzipformat.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The results are returned in the following form:
Query OK, 9 rows affectedWrite multiple files in parallel. Set the file name prefix as
dd2024and specify that each file has a size of no more than 4 MB. Set the character set asutf8mb4and the compression format asgzip.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The following example displays the result:
Query OK, 9 rows affected
Log in to the server and view the exported file information in the
/home/admin/test_datadirectory of the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe query result will be displayed as follows:
data_0.csv.gz dd2024_0.csv.gz test_tbl1.csvIn the example,
test_tbl1.csvis the file name for the single-file sequential write export;data_0.csv.gzis the file name for the parallel export of multiple files when the file name is not specified, and the compression algorithm is set togzip;dd2024_0.csv.gzis the file name for the parallel export of multiple files with the prefixdd2024for the file names and the compression algorithm set togzip.
Example 2: Export data as a PARQUET file using the SELECT INTO OUTFILE statement
Log in to the host where the OBServer node to be connected is located.
Access the node where the OBServer node is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the import file path.
Set the system variable
secure_file_privto configure the accessible path during import or export file operations.Notice
Due to security reasons, when setting the system variable
secure_file_priv, you must connect to the database through the local Unix Socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the host where the OBServer node to be connected is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxExecute the following command to connect to the
mysql001tenant through the local Unix Socket connection method.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin/test_data.obclient [test]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data. Through the mapping table, you can convert data types from MySQL mode to data types supported by PARQUET. For more information about the data types supported by PARQUET, see Data type mapping table.Export data by sequentially writing to a single file named
test_tbl1.parquet, using theSNAPPYcompression format and a ROW GROUP size of128MB.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.parquet' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') FROM tbl1;The execution result is as follows:
Query OK, 9 rows affectedExport data by parallel writing to multiple files without specifying file names, ensuring each file is no larger than 4MB, using the
SNAPPYcompression format and a ROW GROUP size of128MB.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The execution result is as follows:
Query OK, 9 rows affectedExport data by parallel writing to multiple files with the prefix
dd2024for the file names and ensuring each file is no larger than 4MB, using theSNAPPYcompression format and a ROW GROUP size of128MB.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The execution result is as follows:
Query OK, 9 rows affected
Log in to the host and view the exported file information in the
/home/admin/test_datadirectory of the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe execution result is as follows:
data_0.parquet dd2024_0.parquet test_tbl1.parquetIn the example,
test_tbl1.parquetis the file name for the single-file sequential write export;data_0.parquetis the file name for the parallel export of multiple files without specifying the file name;dd2024_0.parquetis the file name for the parallel export of multiple files with the prefixdd2024for the file names.
Log in to the host where the OBServer node to be connected is located.
Access the node where the OBServer node is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the import file path.
Set the system variable
secure_file_privto configure the accessible path during import or export file operations.Notice
Due to security reasons, when setting the system variable
secure_file_priv, you must connect to the database through the local Unix Socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the host where the OBServer node to be connected is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxExecute the following command to connect to the
oracle001tenant through the local Unix Socket connection method.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@oracle001 -p******Set the import path to
/home/admin/test_data.obclient [SYS]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data. Through the mapping table, you can convert data types from Oracle mode to data types supported by PARQUET. For more information about the data types supported by PARQUET, see Data type mapping table.Export data by sequentially writing to a single file named
test_tbl1.parquet, using theSNAPPYcompression format and a ROW GROUP size of128MB.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.parquet' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') FROM tbl1;The execution result is as follows:
Query OK, 9 rows affectedWrite multiple files in parallel without specifying the file names, with each file size not exceeding 4 MB, using the
SNAPPYcompression format, and with row groups of size128 MB.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The returned result is as follows:
Query OK, 9 rows affectedWrite multiple files in parallel with the file name prefix specified as
dd2024, with each file size not exceeding 4 MB, using theSNAPPYcompression format, and with row groups of size128 MB.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The returned result is as follows:
Query OK, 9 rows affected
Log in to the machine and check the exported file information in the
/home/admin/test_datadirectory on the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe returned result is as follows:
data_0.parquet dd2024_0.parquet test_tbl1.parquetHere,
test_tbl1.parquetis the file name of the exported data in the example of writing a single file sequentially;data_0.parquetis the file name of the exported data in the example of writing multiple files in parallel without specifying the file names; anddd2024_0.parquetis the file name of the exported data in the example of writing multiple files in parallel with the file name prefix specified asdd2024.
Example 3: Export data to an ORC file by using the SELECT INTO OUTFILE statement
Log in to the machine where the OBServer node is located.
Go to the OBServer node.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the import file path.
Set the
secure_file_privsystem variable to specify the path that can be accessed when importing or exporting files.Notice
For security reasons, you can modify the
secure_file_privglobal variable only by using a local Unix socket connection. For more information, see secure_file_priv.Log in to the machine where the OBServer node is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxExecute the following command to connect to the
mysql001tenant by using a local Unix socket connection.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin/test_data.obclient [test]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After you reconnect to the database, use the
SELECT INTO OUTFILEstatement to export data. You can use a mapping table to convert data types in MySQL mode to data types supported by ORC. For more information about data types supported by ORC, see Data type mapping.Write a single file sequentially, specify the file name as
test_tbl1.orc, use theSNAPPYcompression format, set the compression block size to256 KB, set the stripe size to64 MB, and record an index every10000rows.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.orc' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) FROM tbl1;The returned result is as follows:
Query OK, 9 rows affectedWrite multiple files in parallel without specifying the file names, with each file size not exceeding 4 MB, using the
SNAPPYcompression format, and with compression blocks of size256 KBand stripes of size64 MB, and record an index every10000rows.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The returned result is as follows:
Query OK, 9 rows affectedWrite multiple files in parallel with the file name prefix specified as
dd2024, with each file size not exceeding 4 MB, using theSNAPPYcompression format, and with compression blocks of size256 KBand stripes of size64 MB, and record an index every10000rows.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The returned result is as follows:
Query OK, 9 rows affected
Log in to the machine and check the exported file information in the
/home/admin/test_datadirectory on the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe returned result is as follows:
data_0.orc dd2024_0.orc test_tbl1.orcHere,
test_tbl1.orcis the file name of the exported data in the example of writing a single file sequentially;data_0.orcis the file name of the exported data in the example of writing multiple files in parallel without specifying the file names; anddd2024_0.orcis the file name of the exported data in the example of writing multiple files in parallel with the file name prefix specified asdd2024.
Log in to the machine where the OBServer node is located.
Go to the OBServer node.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the import file path.
Set the
secure_file_privsystem variable to specify the path that can be accessed when importing or exporting files.Notice
For security reasons, you can modify the
secure_file_privglobal variable only by using a local Unix socket connection. For more information, see secure_file_priv.Log in to the machine where the OBServer node is located.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxRun the following command to connect to the
oracle001tenant by using a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@oracle001 -p******Set the import path to
/home/admin/test_data.obclient [SYS]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After you reconnect to the database, use the
SELECT INTO OUTFILEstatement to export data. By using the mapping table, you can convert various data types in the Oracle mode to data types supported by ORC. For more information about the data types supported by ORC, see Data type mapping table.Write data to a single file in serial mode. Specify the file name as
test_tbl1.orc, the compression format asSNAPPY, the compression block size as256 KB, the stripe size as64 MB, and the index interval as10000.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.orc' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file names. Ensure that each file is no larger than 4 MB. Set the compression format to
SNAPPY, the compression block size to256 KB, the stripe size to64 MB, and the index interval to10000.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the prefix of the file name as
dd2024. Ensure that each file is no larger than 4 MB.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affected
Log in to the server and view the exported files in the
/home/admin/test_datadirectory of the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0.orc dd2024_0.orc test_tbl1.orctest_tbl1.orcis the file name of the data exported in the serial mode to a single file.data_0.orcis the file name of the data exported in the parallel mode to multiple files without specifying the file names.dd2024_0.orcis the file name of the data exported in the parallel mode to multiple files with the prefixdd2024.
Export data files to OSS
Use the SELECT INTO OUTFILE statement to export data from the test_tbl2 table to the specified OSS storage location by partition. The partitioning is based on the combination of the col1 and col2 columns. Rows with the same values in these columns are grouped into the same partition and exported to the same directory.
SELECT /*+parallel(3)*/ * FROM test_tbl2
INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY'
PARTITION BY CONCAT(col1,'/',col2)
SINGLE = FALSE BUFFER_SIZE = '2MB';
The storage location is specified by the $DATA_FOLDER_NAME variable. You also need to provide the OSS host address, Access ID, and Access Key.
More information
You can import a file exported by using the SELECT INTO OUTFILE method by using the LOAD DATA statement. For more information, see Use the LOAD DATA statement to import data.
