SELECT INTO

2025-11-14 07:33:32  Updated

Purpose

You can use this statement to store query results in a variable, a file, or a table. Here are the details:

  • SELECT ... INTO OUTFILE stores the result set in an external file, and you can specify the output format.

    Note

    When you use SELECT ... INTO OUTFILE to output data, you can use \N to represent NULL.

  • SELECT ... INTO DUMPFILE writes an unformatted single line to an external file.

  • SELECT ... INTO var_list stores the result set in variables.

Privilege requirements

To execute the SELECT INTO statement, you must have the FILE privilege and the SELECT privilege on the corresponding table. For more information about OceanBase Database privileges, see Privilege types in MySQL-compatible mode.

Here is an example:

To grant the FILE privilege to a user, use the following command format:

GRANT FILE ON *.* TO user_name;

In this command, user_name specifies the user who needs to execute the SELECT INTO statement.

Syntax

select_stmt 
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
     | OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
     | DUMPFILE 'file_name'
     | into_var_list};

field_opt:
    {COLUMNS | FIELDS} field_term_list

field_term_list:
    field_term [, field_term ...]

field_term:
    {[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string

line_opt:
    LINES line_term_list

line_term_list:
    line_term [line_term]

line_term:
    {STARTING | TERMINATED} BY string

file_opt:
    file_option [, file_option ...]

file_option:
    SINGLE [=] {TRUE | FALSE}
    | MAX_FILE_SIZE [=] {int | string}
    | BUFFER_SIZE [=] {int | string}

external_file_format_list:
    type_csv_option
    | type_parquet_option
    | type_orc_option

type_csv_option:
    TYPE = 'CSV'
    LINE_DELIMITER = '<string>' | <expr>
    FIELD_DELIMITER = '<string>' | <expr>
    ESCAPE = '<character>' | <expr>
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
    FIELD_ENCLOSED_BY = '<character>' | <expr>
    ENCODING = 'charset'
    COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
    FILE_EXTENSION = ['<string>']

type_parquet_option:
    TYPE = 'PARQUET',
    COMPRESSION = '<string>'
    ROW_GROUP_SIZE = '<string>' | <int>

type_orc_option:
    TYPE = 'ORC'
    COMPRESSION = '<string>'
    COMPRESSION_BLOCK_SIZE = '<string>' | <int>
    STRIPE_SIZE = '<string>' | <int>
    ROW_INDEX_STRIDE = <int>

Parameters

Parameter Description
select_stmt The query statement to be executed. select_stmt must return a result set. For more information about the structure and options of the query statement, see SELECT.

Note

You can also specify the INTO clause before the FROM keyword in the query statement. For example, SELECT * INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FROM test_tbl1 WHERE id > 5;.

file_name The path and name of the exported file. For more information, see file_name.
PARTITION BY part_expr

Note

For OceanBase Database V4.3.2, the partitioning method for exported data is controllable since V4.3.2 BP1.

Optional that specifies the partitioning method for exported data. The value of part_expr serves as part of the export path. For each row of data, the system calculates the value of part_expr. Rows with the same value of part_expr are considered to be in the same partition and exported to the same directory.

Note

  • When exporting data by partition, require that SINGLE = FALSE, which means that exporting to multiple files is allowed.
  • Currently, exporting data by partition only supports importing to Alibaba Cloud OSS.

FORMAT = (external_file_format_list) Optional. The FORMAT clause specifies attributes related to the export file format. Use TYPE to specify the exported file format, which can be CSV, PARQUET, or ORC. TYPE cannot be left empty. For more information, see external_file_format_list.
CHARSET | CHARACTER SET charset_name Optional. Specifies the character set to export data to an external file. charset_name indicates the name of the character set.
field_opt Optional. Specifies the format of fields. You can specify it by using the FIELDS or COLUMNS clause. For more information, see field_term.
line_opt Optional. Specifies the start and end characters of data rows. You can specify it by using the LINES clause. For more information, see line_term.
file_opt Optional. Specifies whether to export data to multiple files and the size of each file when data is exported to multiple files. For more information, see file_option.

file_name

The file_name parameter is in the following format:

  • When you export a file to an OBServer node, use the format of \$PATH/\$FILENAME. The parameters are described as follows:

    • \$PATH: specifies the path to save the exported file, which is the path of the exported file on the OBServer node.
    • \$FILENAME: the name of the file to be exported. When SINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the \$FILENAME parameter, the system sets the prefix of the exported file to the default value data. The system automatically generates the file extension based on the export type.
  • When you export a file to OSS, the save path is in the format of oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY. The parameters are described as follows:

    • \$PATH: the path to save the exported file, which specifies the file path in the bucket.
    • \$FILENAME: the name of the file to be exported. When SINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the name, the system uses the default prefix data. The system automatically generates the suffix.
    • \$HOST: specifies the endpoint of the OSS service or the CDN domain name for accelerated access to the OSS service.
    • \$ACCESS_ID: specifies the Access Key ID required for accessing OSS service, which is used for authentication.
    • \$ACCESSKEY: specifies the AccessKey Secret required for accessing the OSS service.

    Note

    • For OceanBase Database V4.3.5, SELECT INTO can be used to export data to objects stored on S3 and objects stored on services that comply with the S3 protocol, starting from V4.3.5 BP2.
    • Alibaba Cloud's Object Storage Service (OSS) has a file size limit. If you want to export a file larger than 5 GB to OSS, the file will be split into multiple files, each smaller than 5 GB.

    field_term

    • [OPTIONALLY] ENCLOSED BY string: specifies the characters that enclose field values. By default, values are not enclosed. For example, ENCLOSED BY '"' indicates that character values are enclosed in double quotation marks. If the OPTIONALLY keyword is used, the specified characters are used to enclose only string values.
    • TERMINATED BY string: specifies the symbol that separates field values. For example, TERMINATED BY ',' uses a comma to separate two field values.
    • ESCAPED BY string: specifies the escape character to process special characters or parse data in a special format. The default escape character is the backslash (\).

    line_term

    • STARTING BY string: specifies the starting character of each row.
    • TERMINATED BY string: specifies the ending character of each line. By default, a line is terminated by a line break. For example, ... LINES TERMINATED BY '\n' ... indicates that each line is terminated by a line break.

    file_option

    • SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.

      • SINGLE [=] TRUE: specifies to export data to a single file. This is the default value.

      • SINGLE [=] FALSE: specifies to export data to multiple files.

        Notice

        When the degree of parallelism is greater than 1 and SINGLE = FALSE, data can be exported to multiple files to achieve parallel read and write and improve the export speed.

    • MAX_FILE_SIZE [=] {int | string}: specifies the maximum size of a single file when data is exported to multiple files. This parameter is effective only when SINGLE = FALSE.

    • BUFFER_SIZE [=] {int | string}: specifies the size of memory that is separately applied for each partition by each thread when data is exported. The value of this parameter represents the size of memory applied for a single partition when data is exported without partitioning. The default value is 1 MB.

      Note

      • BUFFER_SIZE is used for export performance tuning. If sufficient physical memory is available and high export efficiency is desired, you can set this parameter to a large value (for example, 4 MB). If insufficient physical memory is available, you can set this parameter to a small value (for example, 4 KB). If you set this parameter to 0, all partitions of a single thread share a public memory.
      • Starting from OceanBase Database V4.3.2 BP1, the BUFFER_SIZE parameter is supported.

    external_file_format_list

    • When TYPE = 'CSV', the file contains the following columns:

      • LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value is '\n'.
      • FIELD_DELIMITER: specifies the field delimiter of the CSV file. The default value is '\t'.
      • ESCAPE: specifies the escape character of the CSV file. It must be 1 byte in length. The default value is '\'.
      • FIELD_OPTIONALLY_ENCLOSED_BY: specifies the characters that enclose the field values of the CSV file. This parameter is optional. The default value is an empty string. This option allows you to add enclosing characters only to specific types of fields, such as CHAR, VARCHAR, TEXT, and JSON.
      • FIELD_ENCLOSED_BY: specifies the characters that enclose the field values of the CSV file. This parameter is optional. The default value is an empty string, which specifies not to add enclosing characters. This option adds enclosing characters to all types of fields.
      • ENCODING: specifies the character set encoding format of the file. If this parameter is not specified, the default value UTF8MB4 takes effect.
      • COMPRESSION: specifies the compression format of the file. This parameter is optional. The supported compression formats are NONE, GZIP, DEFLATE, and ZSTD.
        • GZIP/DEFLATE: specifies that the file is a GZIP-compressed file.
        • ZSTD: specifies that the file is a ZSTD-compressed file.
        • NONE: specifies that the file is not compressed. This is the default value.
      • FILE_EXTENSION: specifies the user-defined file extension. This parameter is required only when multiple files are exported. It is supported only for CSV files. If this parameter is not specified, the file extension is determined by the file format. The default file extension for CSV files is .csv.
    • When TYPE = 'PARQUET', the following fields are included:

      • COMPRESSION: specifies the compression format of the PARQUET file. The default value is .parquet. The supported compression formats are UNCOMPRESSED (which specifies that the file is not compressed), SNAPPY, GZIP, BROTLI, ZSTD, LZ4, and LZ4_HADOOP. If this parameter is not specified, the default value UNCOMPRESSED takes effect.
      • ROW_GROUP_SIZE: specifies the size of the ROW GROUP in the PARQUET file. The unit is bytes. You can specify a numeric value or a string with a suffix of B, K, M, or G to indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is 256MB. We recommend that you use the default value.
    • When TYPE = 'ORC', it contains the following fields:

      • COMPRESSION: specifies the compression format of the ORC file. The default value is .orc. The supported compression formats are UNCOMPRESSED (which specifies that the file is not compressed), SNAPPY, ZLIB, LZ4, and ZSTD. If this parameter is not specified, the default value UNCOMPRESSED takes effect.
      • COMPRESSION_BLOCK_SIZE: specifies the size of the block into which the data is compressed. The unit is bytes. You can specify a numeric value or a string with a suffix of B, K, M, or G to indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is 256KB. We recommend that you use the default value.
      • STRIPE_SIZE: specifies the size of a Stripe in the ORC file. The unit is bytes. You can specify a numeric value or a string with a suffix of B, K, M, or G to indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is 64MB. We recommend that you use the default value.
      • ROW_INDEX_STRIDE: specifies the frequency of index records. It defines the number of rows between two consecutive index records. If this parameter is not specified, the default value 10000 takes effect. We recommend that you use the default value.

    Examples

    Export data to a local file

    1. Set the export file path.

      To export a file, you must set the system variable secure_file_priv to a path that allows access to the export file.

      Notice

      For security reasons, when you set the system variable secure_file_priv, you can only connect to the database using a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.

      1. Log in to the OBServer node where you want to connect to OceanBase Database.

        ssh admin@xxx.xxx.xxx.xxx
        
      2. Connect to the mysql001 tenant using the local Unix socket.

        obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******
        
      3. Set the export path to /home/admin/test_data.

        SET GLOBAL secure_file_priv = "/home/admin/test_data";
        
      4. Log out.

    2. Reconnect to the database, and then use the SELECT INTO OUTFILE statement to export data. Use commas to separate values in different fields; wrap string values with " characters; and use line breaks as the end marker.

      • Export data to a single file in serial mode. Specify the file name as test_tbl1.csv.

        SELECT /*+parallel(2)*/ * FROM test_tbl1
        INTO OUTFILE '/home/admin/test_data/test_tbl1.csv'
          FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
          LINES TERMINATED BY '\n';
        

        The return result is as follows:

        Query OK, 9 rows affected
        
      • Write data to multiple files in parallel. Do not specify the file name (that is, use the default prefix data), and ensure that the size of each file does not exceed 4 MB.

        SELECT /*+parallel(2)*/ * FROM test_tbl1
          INTO OUTFILE '/home/admin/test_data/'
          FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
          LINES TERMINATED BY '\n'
          SINGLE = FALSE MAX_FILE_SIZE = '4MB';
        

        The return result is as follows:

        Query OK, 9 rows affected
        
      • Write data to multiple files in parallel. Specify the file name prefix as dd2024, and ensure that the size of each file does not exceed 4 MB.

        SELECT /*+parallel(2)*/ * FROM test_tbl1
          INTO OUTFILE '/home/admin/test_data/dd2024'
          FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
          LINES TERMINATED BY '\n'
          SINGLE = FALSE MAX_FILE_SIZE = '4MB';
        

        The return result is as follows:

        Query OK, 9 rows affected
        

      Note

      • When multiple export tasks are exporting data to the same path at the same time, errors may occur or only a part of data may be exported. You can avoid such issues by properly setting the export path. For example, SELECT /*+parallel(2)*/ * FROM t1 INTO OUTFILE 'test/data' SINGLE = FALSE; and SELECT /*+parallel(2)*/ * FROM t2 INTO OUTFILE 'test/data' SINGLE = FALSE; may both be executed at the same time, but they may fail because they have the same file name. In this case, you can set the export path to test/data1 and test/data2 respectively to avoid such issues.
      • When SINGLE = FALSE, if the export fails for reasons such as file already exist, you can delete all files with the same prefix as the export target in the export directory, or delete the export directory and recreate it, and then perform the export operation again. For example, if SELECT /*+parallel(2)*/ * FROM t1 INTO OUTFILE 'test/data' SINGLE = FALSE; fails, you can delete all files with the data prefix in the test directory, or directly delete the test directory and recreate it, and then try to perform the export operation again.

    3. Log in to the server and view the exported files in the /home/admin/test_data directory on the OBServer node.

    [xxx@xxx /home/admin/test_data]# ls
    

    The return result is as follows:

    data_0_0_0  data_0_1_0  dd2024_0_0_0  dd2024_0_1_0  test_tbl1.csv
    

    In this example, test_tbl1.csv is the file name when data is exported to a single file in serial mode; data_0_0_0 and data_0_1_0 are the file names when data is written to multiple files in parallel without specifying the file name; dd2024_0_0_0 and dd2024_0_1_0 are the file names when data is written to multiple files in parallel with the file name prefix set to dd2024.

    Export data to OSS

    You can execute the SELECT INTO OUTFILE statement to export data from the test_tbl2 table to a specified OSS storage location based on the partitions. The partitions are determined by the combination of the col1 and col2 columns. Rows with the same values in these two columns are grouped into the same partition and exported to the same directory.

    SELECT /*+parallel(3)*/ * FROM test_tbl2
      INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY'
        PARTITION BY CONCAT(col1,'/',col2)
        SINGLE = FALSE BUFFER_SIZE = '2MB';
    

    The storage location is specified by the $DATA_FOLDER_NAME variable. You also need to provide the host address, access ID, and access key of the OSS.

    References

    Export data by using the OUTFILE statement

Contact Us