Overview
This statement is used to import data from an external source. OceanBase Database provides two forms of the LOAD DATA statement, each with a different execution path and use case:
- LOAD DATA INFILE: This form is compatible with MySQL syntax and supports only CSV format. The file source can be specified using
INFILE,LOCAL, orREMOTE_OSSto indicate whether the file is located on the server, client, or in Object Storage Service (OSS). - LOAD DATA FROM: This form is recommended. It reads data from a URL using a URL table and supports multiple formats, including CSV, PARQUET, ORC, and ODPS. The execution path differs from the INFILE form and is better suited for batch import scenarios such as AP.
The syntax, parameters, and diagnostic methods differ significantly between the two forms. Please refer to the corresponding sections below based on the actual form you are using.
General considerations
You cannot use the
LOAD DATAstatement for tables that have triggers.To import data from an external file, you must have the
FILEprivilege and the following settings:- When loading server-side files, you must set the system variable secure_file_priv to specify the path for importing or exporting files.
- When loading client-side local files, you must add the
--local-infile[=1]option when starting the MySQL/OBClient client to enable the feature of loading data from the local file system.
When using partitioned table direct load, the target table cannot be a replica table, and it cannot contain auto-increment columns, identity columns, or global indexes.
To improve import efficiency, we recommend that you create the base table first and then create indexes after the import is complete. If global indexes are involved, make sure to create them after the import is complete, otherwise, an error
not supportmay occur.
1. LOAD DATA INFILE
This form is compatible with the LOAD DATA syntax of MySQL. It only supports CSV format. It can load files from the server (INFILE), client local files (LOCAL), or OSS files (REMOTE_OSS). The import process involves parsing the file, distributing the data, and inserting it.
Note
- When you execute
LOAD DATA LOCAL INFILE, the system automatically adds theIGNOREoption. - If the file to be imported is located on the local disk of an OBServer node, execute this statement directly on the OBServer node. Do not execute it through a Proxy.
Syntax of LOAD DATA INFILE
LOAD DATA
[/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name [PARTITION(PARTITION_OPTION)]
[CHARACTER SET 'charset_name']
[COMPRESSION [=] {AUTO|NONE|GZIP|DEFLATE|ZSTD}]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
[SET col_name = {expr | DEFAULT}
[, col_name = {expr | DEFAULT}] ...]
load_mode:
'full'
| 'inc_replace'
PARTITION_OPTION:
partition_option_list
| subpartition_option_list
Support for Location objects in LOAD DATA INFILE and SELECT INTO OUTFILE
Overview
LOAD DATA INFILE and SELECT INTO OUTFILE support specifying the file path by using Location objects. If the AccessKey (AK) and SecretKey (SK) are not explicitly specified, the system automatically matches the Location object based on the longest path matching principle.
LOAD DATA INFILE
Automatic matching mechanism
If the import path is OSS and the AK and SK are not explicitly specified, the system automatically matches the path with the Location object that the current user has permissions for, and selects the corresponding AK and SK based on the longest path matching principle.
Note
The type of the Location object must be OSS path.
Syntax example
Specify the path by using the @location_name 'relative_path' format, where the part after @ is the name of the Location object, and the string is the subpath relative to the object:
LOAD DATA REMOTE_OSS INFILE @oss_t 'load_data_test.csv'
INTO TABLE load_table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
In this example, @oss_t is the name of the Location object, and 'load_data_test.csv' is the subpath relative to the object.
SELECT INTO OUTFILE
Automatic matching mechanism
If the export path is OSS or HDFS and the AK and SK are not explicitly specified, the system automatically matches the path with the Location object that the current user has permissions for, and selects the corresponding AK and SK based on the longest path matching principle.
Note
The type of the Location object can be any of the following: OSS, HDFS, and so on.
Syntax example
Specify the path by using the @location_name 'relative_path' format, where the part after @ is the name of the Location object, and the string is the subpath relative to the object:
SELECT *
INTO OUTFILE @oss_t 'select_into_test.csv'
FORMAT = (
TYPE = 'CSV',
FIELD_DELIMITER = ',',
LINE_DELIMITER = '\n'
)
FROM export_table;
In this example, @oss_t is the name of the Location object, and 'select_into_test.csv' is the subpath relative to the object.
Parameters (applicable only to the INFILE form)
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for data loading. The default value is 4. We recommend that you set the value to a number within the range [0, maximum number of CPU cores of the tenant]. |
| load_batch_size(M) | The batch size for each insert. The default value of M is 100. We recommend that you set the value to a number within the range [100, 1000]. |
| APPEND | direct() |NO_DIRECT | Use the hint to enable direct load.
NoticeDo not perform an upgrade of OceanBase Database during a direct load task, because this may cause the direct load task to fail.
|
| REMOTE_OSS | LOCAL | Optional.
|
| file_name | The path and name of the input file. file_name can be in the following formats:
NoteWhen you import a file from an object storage service, make sure that the following conditions are met:
|
| REPLACE | IGNORE | If a unique key conflict occurs, REPLACE indicates that the conflicting row will be overwritten, and IGNORE indicates that the conflicting row will be ignored. LOAD DATA determines whether data is duplicated based on the table's primary key. If the table does not have a primary key, the REPLACE and IGNORE options have no effect. By default, when duplicate data is encountered, LOAD DATA logs the error data to the log file.
Notice
|
| table_name | The name of the table to import data into.
|
| CHARACTER SET 'charset_name' | Optional. Specifies the character set used for the source file. If not specified, the system defaults to UTF-8. |
| PARTITION_OPTION | Specifies the partition name for direct load.
NoteSpecifying partitions is only supported for direct load. It is not supported for regular LOAD DATA. That is, if you do not add a direct load hint or set a direct load configuration, specifying partitions during LOAD DATA has no effect. |
| COMPRESSION | Specifies the compression format of the file.
|
| FIELDS | COLUMNS | Specifies the format of the fields.
|
| LINES STARTING BY | Specifies the starting character for a line. |
| LINES TERMINATED BY | Specifies the ending character for a line. |
| IGNORE number { LINES | ROWS } | Specifies the number of rows to ignore. LINES indicates the number of rows at the beginning of the file, and ROWS indicates the number of rows at the beginning of the file specified by the field delimiter. By default, the system maps each field in the input file to the corresponding column in the table. If the input file does not contain all the columns, the missing columns are filled with default values based on the following rules:
NoteFor multiple files, the behavior is the same as for a single file. |
| column_name_var | Optional. Specifies the name of the imported column. To represent a null value in the file, use \N. |
| SET col_name = {expr | DEFAULT} | Optional. Specifies an expression or default value for a column not read from the file. For example, SET dates=CURRENT_TIMESTAMP sets the dates column to the current timestamp. |
Supported format: This format only supports the CSV format.
Bypass mode and non-bypass mode: You can use the Hint to enable bypass import to improve performance.
Wildcard rules for multi-file direct load (INFILE only)
To facilitate multi-file imports, wildcard support has been introduced for both server-side and OSS file imports, but not for client-side imports.
Server-side wildcard usage
- Matching rules: File name matching:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';; directory matching is also supported, and it works similarly to file name matching. - Considerations: At least one matching file must exist; otherwise, an error code 4027 is returned. Only POSIX-compliant GLOB functions are supported.
- Matching rules: File name matching:
Wildcard usage in Alibaba Cloud Object Storage Service (OSS): File name matching:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';. Directory matching is not supported. Only*and?are supported as file name wildcards.
2. LOAD DATA FROM (Recommended)
This method reads data from a URL table and supports various formats such as CSV, PARQUET, ORC, and ODPS. The execution path is different from the INFILE method, making it suitable for batch import scenarios like AP. This is the currently recommended method. When executed, it is equivalent to INSERT INTO ... SELECT ...: LOAD DATA FROM FILES(...) INTO TABLE table_name is equivalent to INSERT INTO table_name SELECT * FROM FILES(...). The hint is the same as that of the INSERT statement.
LOAD DATA FROM syntax
Note
When you execute this form, it is equivalent to INSERT INTO table_name SELECT ... FROM url_table_function_expr. Therefore, the hints in this statement are consistent with those in the INSERT statement (such as direct and parallel). The load_batch_size parameter, which is specific to the INFILE form, is no longer supported.
LOAD DATA
[/*+ insert_hint */]
[REPLACE | IGNORE]
FROM { url_table_function_expr |
( SELECT expression_list FROM url_table_function_expr ) }
INTO TABLE table_name
[PARTITION(PARTITION_OPTION)]
[(column_name_var [, column_name_var] ...)]
[LOG ERRORS
[INTO 'logfile_string']
[REJECT LIMIT {integer | UNLIMITED}]
[BADFILE 'badfile_string']]
url_table_function_expr:
FILES (
LOCATION = '<string>',
{
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
PARSE_HEADER = { TRUE | FALSE },
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'charset',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
| FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
},
[PATTERN = '<regex_pattern>']
)
| SOURCE (
TYPE = 'ODPS',
ACCESSID = '<string>',
ACCESSKEY = '<string>',
ENDPOINT = '<string>',
TUNNEL_ENDPOINT = '<string>',
PROJECT_NAME = '<string>',
SCHEMA_NAME = '<string>',
TABLE_NAME = '<string>',
QUOTA_NAME = '<string>',
COMPRESSION_CODE = '<string>'
)
PARTITION_OPTION:
partition_option_list
| subpartition_option_list
Parameters (only for LOAD DATA FROM form)
| Parameter | Description |
|---|---|
| Hint | Consistent with the hints in the INSERT statement (this form is equivalent to the INSERT INTO ... SELECT ... statement). |
| url_table_function_expr | Specifies the data source and format by using FILES or SOURCE. |
| table_name, PARTITION_OPTION | Same as in the INFILE form. |
| column_name_var | Optional. Specifies the name of the column to be imported. |
| LOG ERRORS | Optional. Enables error diagnostics during the import. For more information, see log_errors. |
Supported formats: CSV, PARQUET, and ORC (by using FILES); and ODPS (by using SOURCE).
Bypass and non-bypass: You can enable bypass import by using a hint.
Diagnostics: Use LOG ERRORS, REJECT LIMIT, and BADFILE to record failed rows. Use READ_ERROR_LOG to view the error log. For more information, see log_errors.
FILES
The FILES keyword consists of the LOCATION, FORMAT, and PATTERN clauses.
The
LOCATIONclause specifies the path where the external table files are stored. Typically, the data files of the external table are stored in a separate directory, which can contain subdirectories. When a table is created, the external table automatically collects all files in this directory.For a local location, the format is
LOCATION = '[file://] local_file_path', wherelocal_file_pathcan be a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database. Thesecure_file_privparameter specifies the file path that the OBServer node has permission to access.local_file_pathmust be a subpath of thesecure_file_privpath.For a remote location, the format is as follows:
LOCATION = '{oss|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path'where$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required to access OSS and S3, ands3_regionis the region information selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.LOCATION = 'hdfs://$ {hdfs_namenode_address}:${port}/PATH.localhost'whereportis the port number of HDFS, andPATHis the directory path in HDFS.- For Kerberos authentication:
LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Where:principal: the user for login authentication.keytab: the path to the user's authentication key file.krb5conf: the path to the Kerberos environment description file.configs: additional HDFS configuration items. By default, it is empty. However, in a Kerberos environment, this configuration item usually has a value and needs to be configured, for example:dfs.data.transfer.protection=authentication,privacy, which specifies the data transmission protection level asauthenticationandprivacy.
- For Kerberos authentication:
Notice
When using an object storage path, the parameters of the object storage path are separated by the
&symbol. Ensure that the parameter values you enter contain only uppercase and lowercase letters, numbers,\/-_$+=, and wildcards. If you enter other characters, the settings may fail.
The
FORMATclause specifies the properties related to the file reading format and supports three file formats: CSV, PARQUET, and ORC.When TYPE = 'CSV', the following fields are included:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value isLINE_DELIMITER='\n'.FIELD_DELIMITER: optional. Specifies the column delimiter of the CSV file. The default value isFIELD_DELIMITER='\t'.PARSE_HEADER: optional. Specifies whether the first line of the CSV file is the column name for each column. The default value isFALSE, indicating that the first line of the CSV file is not specified as the column name for each column.ESCAPE: specifies the escape character of the CSV file. It must be one byte. The default value isESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: optional. Specifies the symbol used to wrap field values in the CSV file. The default value is an empty string. Using this option indicates that only certain types of fields (such as CHAR, VARCHAR, TEXT, and JSON) are wrapped.ENCODING: specifies the character set encoding format of the file. If not specified, the default value isUTF8MB4.NULL_IF: specifies the string that is treated asNULL. The default value is an empty string.SKIP_HEADER: skips the file header and specifies the number of lines to skip.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, indicating that blank lines are not skipped.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, indicating that leading and trailing spaces in fields are not removed.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, indicating that leading and trailing spaces in fields are not removed.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. The default value isFALSE, indicating that empty strings are not treated asNULL.
When TYPE = 'PARQUET/ORC', there are no additional fields.
The
PATTERNclause specifies a regular expression pattern to filter files in theLOCATIONdirectory. For each file path in theLOCATIONdirectory, if it matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, the external table can access all files in theLOCATIONdirectory by default.
SOURCE
The SOURCE keyword does not include any other clauses. In this case, TYPE = 'ODPS' and the following fields are included:
ACCESSID: specifies the ID of the ODPS user.ACCESSKEY: specifies the password of the ODPS user.ENDPOINT: specifies the connection address of the ODPS service.TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.PROJECT_NAME: specifies the project where the table to be queried is located.SCHEMA_NAME: optional. Specifies the schema of the table to be queried.TABLE_NAME: specifies the name of the table to be queried.QUOTA_NAME: optional. Specifies whether to use the specified quota.COMPRESSION_CODE: optional. Specifies the compression format of the data source. Supported compression formats includeZLIB,ZSTD,LZ4, andODPS_LZ4. If not specified, compression is not enabled.
log_errors
LOG ERRORS: Enables error diagnostics during the import process, allowing failed rows to be recorded instead of terminating the entire operation at the first error. When used with theREJECT LIMITclause, it controls the maximum number of rows that can be rejected.INTO 'logfile_string': Optional. Specifies the file in the target directory where error information will be written. IfINTO 'logfile_string'is not specified, error information is only recorded in thewarning buffer, which can be viewed usingshow warnings.logfile_stringspecifies the directory for storing error information, with the following format:Note
The
INTO 'logfile_string'parameter is supported starting from V4.4.0.When error information is stored locally,
logfile_stringis in the format[file://] local_file_path, wherelocal_file_pathcan be either a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database.secure_file_privspecifies the file paths that OBServer nodes have permission to access.local_file_pathmust be a subpath ofsecure_file_priv.When error information is stored remotely (refer to the Location section in the syntax for creating external tables),
logfile_stringis in the following format:{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path, where$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access credentials required for accessing Alibaba Cloud OSS, AWS S3, and object storage compatible with the S3 protocol, respectively.s3_regionspecifies the region selected when using S3. These sensitive access credentials are stored in the system tables of the database in an encrypted manner.hdfs://localhost:port/PATH, wherelocalhostis the address of HDFS,portis the port number of HDFS, and PATH is the directory path in HDFS. For Kerberos authentication, the address is:hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx.
OceanBase Database allows you to set tenant-level parameters to configure the compression algorithm and maximum size of a single diagnostic log file. For more information, see load_data_diagnosis_log_compression and load_data_diagnosis_log_max_size.
REJECT LIMIT: Optional. Specifies the maximum number of rows that can be rejected:- Default value: 0. No rows can be rejected. The operation fails at the first error.
integer: The maximum number of rows that can be rejected on a single server. For example, 10 means that up to 10 rows can be rejected on a single server.UNLIMITED: No limit on the number of rejected rows.
BADFILE 'badfile_string': Specifies the path where error data files are stored. The format ofbadfile_stringis the same as that oflogfile_string.Note
The
BADFILE 'badfile_string'parameter is supported starting from V4.4.0.
Notice
- If the
LOG ERRORSclause is not specified, the default behavior is to import data normally, which means that the operation fails at the first error. - If the
LOG ERRORSclause is specified but theREJECT LIMITclause is not, it is equivalent to settingLIMITto 0. The operation fails at the first error, but the first error is recorded, and the error code is a diagnostic-related error, such as "reject limit reached".View error logs
OceanBase Database allows you to execute the following SQL statement to view the error logs of an import operation:
SELECT * FROM READ_ERROR_LOG('diagnosis_log_path');In this statement,
diagnosis_log_pathspecifies the path of the error log. When executed, this statement is equivalent to the following URL external table query:SELECT * FROM FILES ( LOCATION = 'diagnosis_log_path/' FORMAT( TYPE = 'csv' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '\', PARSE_HEADER = true ) [, PATTERN = 'filename'] );Here is an example:
The log path is specified as a file name (without a trailing
/).SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/filename');The corresponding URL external table query is as follows (the file name is used as a pattern to filter files):
SELECT * FROM FILES ( LOCATION = 'diagnosis/log/path/', FORMAT ( TYPE = 'csv' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '\', PARSE_HEADER = true ), PATTERN = 'filename' );The log path is specified as a folder (with a trailing
/).SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/');The corresponding URL external table query is as follows (no pattern is used):
SELECT * FROM FILES ( LOCATION = 'diagnosis/log/path/', FORMAT ( TYPE = 'csv' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '\', PARSE_HEADER = true ) );
Examples
Note
When you use
LOAD DATAto load data, you can use\Nto representNULL.1. Example of the LOAD DATA INFILE statement
Import data from a server-side file
Example 1: Import data from a server-side file.
Set the global secure file path.
obclient> SET GLOBAL secure_file_priv = "/" Query OK, 0 rows affected obclinet> \q ByeNote
Since
secure_file_privis aGLOBALvariable, you need to execute\qto make it effective.After reconnecting to the database, import data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected
Example 2: Use the
APPENDhint to enable direct load.LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/a.csv' INTO TABLE t;Example 3: Specify the source file character set.
If the source file is encoded in GBK or another non-UTF-8 encoding, you must use
CHARACTER SETto specify the source file character set. Otherwise, an error may occur during the import of Chinese characters (such asERROR 1366 (HY000): Incorrect string value).obclient> LOAD DATA INFILE '/tmp/t1.csv' INTO TABLE t1 CHARACTER SET 'gbk' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'; Query OK, 2 rows affectedIf the source file is encoded in UTF-8, you can specify
CHARACTER SET 'utf8mb4'.obclient> LOAD DATA INFILE '/tmp/t1.csv' INTO TABLE t1 CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'; Query OK, 2 rows affectedII. Example of LOAD DATA FROM
Example 4: Import a CSV file (FROM FILES).
Import all columns from the
test1.csvfile.load data /*+ direct(true,0) parallel(2)*/ from files( location = "data/csv", format = ( type = 'csv', field_delimiter = ',', parse_header = true, skip_blank_lines = true ), pattern = 'test1.csv') into table t1;Read the
c1andc2columns from thetest1.csvfile in thedata/csvdirectory and import them into thecol1andcol2columns of thet1table.load data /*+ direct(true,0) parallel(2)*/ from ( select c1, c2 from files( location = 'data/csv' format = ( type = 'csv', field_delimiter = ',', parse_header = true, skip_blank_lines = true ), pattern = 'test1.csv')) into table t1 (col1, col2);
Example 5: Import a PARQUET file.
load data /*+ direct(true,0) parallel(2)*/ from files( location = "data/parquet", format = ( type = 'PARQUET'), pattern = 'test1.parquet') into table t1;Example 6: Import an ORC file.
load data /*+ direct(true,0) parallel(2)*/ from files( location = "data/orc", format = ( type = 'ORC'), pattern = 'test1.orc') into table t1;Example 7: Import an ODPS file.
load data /*+ direct(true,0) parallel(2)*/ from source ( type = 'ODPS', accessid = '$ODPS_ACCESSID', accesskey = '******', endpoint= '$ODPS_ENDPOINT', project_name = 'example_project', schema_name = '', table_name = 'example_table', quota_name = '', compression_code = '') into table t1;Import data from a client (local) file
Example 1: Import data from a local file to a table in OceanBase Database.
Open a terminal or command prompt window and enter the following command to start the client.
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -DtestThe return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221719526 Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [test]>Notice
To use the
LOAD DATA LOCAL INFILEfeature, you must use OBClient V2.2.4 or later. If you do not have a specific version of OBClient, you can also use a MySQL client to connect to the database.In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Example 2: Directly import a compressed file by setting COMPRESSION.
LOAD DATA LOCAL INFILE '/your/file/lineitem.tbl.gz' INTO TABLE lineitem COMPRESSION GZIP FIELDS TERMINATED BY '|';Example 3: Specify a partition for direct import by using PARTITION.
- Specify a partition for direct import
load data /*+ direct(true,0) parallel(2) load_batch_size(100) */ infile "$FILE_PATH" into table t1 partition(p0, p1) fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';- Specify a subpartition for direct import
load data /*+ direct(true,0) parallel(2) load_batch_size(100) */ infile "$FILE_PATH" into table t1 partition(p0sp0, p1sp1) fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';Import data from an OSS file
Example 1: Enable direct import by using the
direct(bool, int)hint, and the direct import file can be on OSS.load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';Import data from a server-side file as a URL external table (with LOG ERRORS diagnostics)
Notice
The commands involving IP addresses in the examples have been desensitized. When verifying, please fill in the actual IP address of your machine.
The following example demonstrates how to import data from an external file located on the server (OBServer node) and in OceanBase Database in MySQL mode. The steps are as follows:
Create directories on the OBServer node. The
/home/admin/test_csvdirectory stores the external data, the/home/admin/test_intodirectory stores the error logs, and the/home/admin/test_badfiledirectory stores the error data files.[admin@xxx /home/admin]# mkdir -p /home/admin/{test_csv,test_into,test_badfile}Prepare the external file. In the
/home/admin/test_csvdirectory, create a file namedtype_cast.csv.[admin@xxx /home/admin/test_csv]# vi type_cast.csvThe content of the file is as follows:
1,2,3 2,4,af 3,4,5 ds,6,32 4,5,6 5,2,3 6,v4,af 7,4,5 kj,a6,32 8,5,6Set the import file path.
Notice
For security reasons, when setting the
secure_file_privsystem variable, you can only modify the global variable by executing an SQL statement through a local Socket connection to the database. For more information, see secure_file_priv.Run the following command to log in to the server where the OBServer node is located.
ssh admin@10.10.10.1Run the following command to connect to the
mysql001tenant by using a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Run the following SQL statement to set the import and export mode to unlimited.
SET GLOBAL secure_file_priv = "/";
Reconnect to the
mysql001tenant.Here is an example:
obclient -h10.10.10.1 -P2881 -uroot@mysql001 -p****** -A -Ddb_testCreate a table named
test_tbl1.CREATE TABLE test_tbl1(col1 INT, col2 INT, col3 INT);Set the compression algorithm used for diagnostic logs to
AUTO.ALTER SYSTEM SET load_data_diagnosis_log_compression = 'AUTO';For more information about how to set the compression algorithm used for diagnostic logs, see load_data_diagnosis_log_compression.
Set the maximum size of a single diagnostic log file to 1 KB. When the size of the exported log exceeds 1 KB, a second file will be automatically generated and the export will continue.
ALTER SYSTEM SET load_data_diagnosis_log_max_size = '1K';For more information about how to set the size of a single diagnostic log file, see load_data_diagnosis_log_max_size.
Use the
LOAD DATAstatement to import data into thetest_tbl1table from a URL external table, specifying error diagnostics, with the error log stored in the/home/admin/test_into/directory and the error data file stored in the/home/admin/test_badfile/directory.LOAD DATA FROM FILES( LOCATION = '/home/admin/test_csv/', FORMAT = ( TYPE = 'csv', FIELD_DELIMITER = ','), PATTERN = 'type_cast.csv') INTO TABLE test_tbl1 LOG ERRORS INTO '/home/admin/test_into/' REJECT LIMIT UNLIMITED BADFILE '/home/admin/test_badfile/';The return result is as follows:
Query OK, 6 rows affected, 4 warnings Records: 6 Deleted: 0 Skipped: 0 Warnings: 4Run the
read_error_logstatement to view the error log content.SELECT * FROM READ_ERROR_LOG('/home/admin/test_into/');The return result is as follows:
+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+ | ERROR CODE | FILE NAME | LINE NUMBER | ERROR MESSAGE | +------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+ | -4226 | type_cast.csv | 4 | fail to scan file type_cast.csv at line 4 for column "db_test"."test_tbl1"."col1", error: Incorrect integer value | | -4226 | type_cast.csv | 9 | fail to scan file type_cast.csv at line 9 for column "db_test"."test_tbl1"."col1", error: Incorrect integer value | | -4226 | type_cast.csv | 7 | fail to scan file type_cast.csv at line 7 for column "db_test"."test_tbl1"."col2", error: Incorrect integer value | | -4226 | type_cast.csv | 2 | fail to scan file type_cast.csv at line 2 for column "db_test"."test_tbl1"."col3", error: Incorrect integer value | +------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+ 4 rows in setView the data in the
test_tbl1table.SELECT * FROM test_tbl1;The return result is as follows:
+------+------+------+ | col1 | col2 | col3 | +------+------+------+ | 1 | 2 | 3 | | 3 | 4 | 5 | | 4 | 5 | 6 | | 5 | 2 | 3 | | 7 | 4 | 5 | | 8 | 5 | 6 | +------+------+------+ 6 rows in setView the content of the bad file by using the URL external table.
SELECT * FROM FILES ( LOCATION = '/home/admin/test_badfile/', FORMAT ( TYPE = 'csv', FIELD_DELIMITER = ','), PATTERN = 'data.bad');The return result is as follows:
+------+------+------+ | c1 | c2 | c3 | +------+------+------+ | ds | 6 | 32 | | kj | a6 | 32 | | 6 | v4 | af | | 2 | 4 | af | +------+------+------+ 4 rows in set
References
- For more information about how to connect to OceanBase Database, see Overview of connection methods.
- For more information about how to use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement
