Purpose
This statement is used to import data from an external source.
OceanBase Database supports the following input file types for the
LOAD DATAstatement:Server-side files (on OBServer nodes): Files stored on the OBServer nodes of OceanBase Database. You can use the
LOAD DATA INFILEorLOAD DATA FROM URLstatement to load data from these server-side files into a database table.Client-side files (local files): Files stored on the local file system of the client. You can use the
LOAD DATA LOCAL INFILEorLOAD DATA FROM URLstatement to load data from these client-side files into a database table.Note
When OceanBase Database executes the
LOAD DATA LOCAL INFILEcommand, the system automatically adds theIGNOREoption.OSS files: Files stored in the Object Storage Service (OSS) file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from these OSS files into a database table.
The
LOAD DATAstatement can currently import text files in CSV format. The entire import process can be divided into the following steps:File parsing: OceanBase Database reads the data from the specified file and parses it in parallel or sequentially based on the specified degree of parallelism.
Data distribution: Since OceanBase Database is a distributed database, data from different partitions may be stored on different OBServer nodes. The
LOAD DATAstatement calculates where each piece of data should be sent based on the parsed data.Data insertion: After receiving the data, the target OBServer node executes the
INSERToperation to insert the data into the corresponding partition.
Considerations
You cannot use the
LOAD DATAstatement for a table that contains a trigger.To import data from an external file, you must have the
FILEprivilege and the following settings:- When loading a server-side file, you must set the system variable secure_file_priv to specify the path for accessing files during import or export.
- When loading a local client-side file, you must add the
--local-infile[=1]option when starting the MySQL/OBClient client to enable the data loading feature from the local file system.
When using partitioned table bypass loading, the target table cannot be a replicated table, and it cannot contain auto-increment columns, identity columns, or global indexes.
Syntax
-- Load data from a file
LOAD DATA
[/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name [PARTITION(PARTITION_OPTION)]
[COMPRESSION [=] {AUTO|NONE|GZIP|DEFLATE|ZSTD}]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
load_mode:
'full'
| 'inc_replace'
PARTITION_OPTION:
partition_option_list
| subpartition_option_list
-- Load data from a URL
LOAD DATA
[/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
[REPLACE | IGNORE]
FROM { url_table_function_expr |
( SELECT expression_list FROM url_table_function_expr ) }
INTO TABLE table_name
[PARTITION(PARTITION_OPTION)]
[(column_name_var [, column_name_var] ...)]
[LOG ERRORS
[INTO 'logfile_string']
[REJECT LIMIT {integer | UNLIMITED}]
[BADFILE 'badfile_string']]
load_mode:
'full'
| 'inc_replace'
url_table_function_expr:
FILES (
LOCATION = '<string>',
{
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
PARSE_HEADER = { TRUE | FALSE },
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'charset',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
| FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
},
[PATTERN = '<regex_pattern>']
)
| SOURCE (
TYPE = 'ODPS',
ACCESSID = '<string>',
ACCESSKEY = '<string>',
ENDPOINT = '<string>',
TUNNEL_ENDPOINT = '<string>',
PROJECT_NAME = '<string>',
SCHEMA_NAME = '<string>',
TABLE_NAME = '<string>',
QUOTA_NAME = '<string>',
COMPRESSION_CODE = '<string>'
)
PARTITION_OPTION:
partition_option_list
| subpartition_option_list
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for data loading. The default value is 4. |
| load_batch_size(M) | The size of the batch to be inserted each time. The default value of M is 100. We recommend that you set the value to a number in the range [100, 1000]. |
| APPEND | direct() |NO_DIRECT | The hint for enabling direct load.
NoticeDuring the direct load task, we do not recommend that you upgrade OceanBase Database, because this may cause the direct load task to fail.
|
| REMOTE_OSS | LOCAL | Optional.
|
| file_name | The path and name of the input file. file_name can be in the following formats:
NoteWhen you import a file from OSS, make sure that the following conditions are met:
|
| REPLACE | IGNORE | If a unique key conflict occurs, REPLACE specifies that the conflicting row is overwritten, and IGNORE specifies that the conflicting row is ignored. LOAD DATA determines whether data is duplicated based on the table's primary key. If the table does not have a primary key, the REPLACE and IGNORE options have no effect. By default, when duplicate data is encountered, LOAD DATA records the error data in the log file.
Notice
|
| url_table_function_expr | Optional. Specifies whether to read data from the file system or data source using the FILES and SOURCE keywords. |
| table_name | The name of the table to which data is imported.
|
| PARTITION_OPTION | Specifies the partition name for direct load.
NotePartition specification is only supported for direct load and not for regular LOAD DATA. In other words, if you do not add a direct load hint or set a direct load configuration item, specifying a partition in a LOAD DATA statement will not take effect. |
| COMPRESSION | Specifies the compression format of the file.
|
| FIELDS | COLUMNS | Specifies the format of the fields.
|
| LINES STARTING BY | Specifies the starting delimiter of a line. |
| LINES TERMINATED BY | Specifies the ending delimiter of a line. |
| IGNORE number { LINES | ROWS } | Specifies the number of rows to be ignored. LINES specifies the number of rows at the beginning of the file, and ROWS specifies the number of rows at the beginning of the file based on the field delimiter. By default, the system maps each field in the input file to a column in the table. If the input file does not contain all the columns, the missing columns are filled in based on the following rules:
NoteWhen you import data from multiple files, the behavior is the same as when you import data from a single file. |
| column_name_var | Optional. Specifies the name of the imported column. |
| LOG ERRORS | Optional. Specifies whether to enable error diagnostics during the import of a URL table. For more information, see log_errors. |
FILES
The FILES keyword is composed of the LOCATION, FORMAT, and PATTERN clauses.
The
LOCATIONclause specifies the path where the external table's data files are stored. Typically, the data files of an external table are stored in a separate directory, which can contain subdirectories. When the external table is created, it automatically collects all files in this directory.The format for a local
LOCATIONisLOCATION = '[file://] local_file_path', wherelocal_file_pathcan be a relative or absolute path. If a relative path is provided, the current directory must be the installation directory of OceanBase Database. Thesecure_file_privparameter specifies the file paths that OBServer nodes have permission to access.local_file_pathmust be a subpath of thesecure_file_privpath.The format for a remote
LOCATIONis as follows:LOCATION = '{oss|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required for accessing OSS and S3, ands3_regionis the region information selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.LOCATION = 'hdfs://$ {hdfs_namenode_address}:${port}/PATH.localhost', whereportis the port number of HDFS, andPATHis the directory path in HDFS.- For Kerberos authentication:
LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Where:principal: the user for login authentication.keytab: the path to the user authentication key file.krb5conf: the path to the Kerberos environment description file.configs: additional HDFS configuration items. By default, it is empty. However, in a Kerberos environment, this configuration item usually has a value and needs to be configured, for example:dfs.data.transfer.protection=authentication,privacy, which specifies the data transmission protection level asauthenticationandprivacy.
- For Kerberos authentication:
Notice
When using an object storage path, the parameters of the object storage path are separated by the
&symbol. Ensure that your input parameter values contain only uppercase and lowercase letters, numbers,\/-_$+=, and wildcards. If you input other characters, the settings may fail.
The
FORMATclause specifies the properties related to the file reading format. It supports three file formats: CSV, PARQUET, and ORC.When TYPE = 'CSV', the following fields are included:
LINE_DELIMITER: specifies the line delimiter for CSV files. The default value isLINE_DELIMITER='\n'.FIELD_DELIMITER: optional. Specifies the column delimiter for CSV files. The default value isFIELD_DELIMITER='\t'.PARSE_HEADER: optional. Specifies whether the first row of the CSV file is the column name for each column. The default value isFALSE, indicating that the first row of the CSV file is not specified as the column name for each column.ESCAPE: specifies the escape character for CSV files. It can only be one byte, with the default value ofESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: optional. Specifies the symbol used to enclose field values in the CSV file. The default value is empty. Using this option indicates that only certain types of fields (such as CHAR, VARCHAR, TEXT, and JSON) are enclosed.ENCODING: specifies the character set encoding format of the file. If not specified, the default value isUTF8MB4.NULL_IF: specifies the string that is treated asNULL. The default value is empty.SKIP_HEADER: skips the file header and specifies the number of rows to skip.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, indicating that blank lines are not skipped.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, indicating that leading and trailing spaces in fields are not removed.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. The default value isFALSE, indicating that empty strings are not treated asNULL.
When TYPE = 'PARQUET/ORC', there are no additional fields.
The
PATTERNclause specifies a regular expression pattern for filtering files in theLOCATIONdirectory. For each file path in theLOCATIONdirectory, if it matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, the external table can access all files in theLOCATIONdirectory by default.
SOURCE
The SOURCE keyword does not include any other clauses. In this case, TYPE = 'ODPS' and the following fields are included:
ACCESSID: specifies the ID of the ODPS user.ACCESSKEY: specifies the password of the ODPS user.ENDPOINT: specifies the connection address of the ODPS service.TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.PROJECT_NAME: specifies the project where the table to be queried is located.SCHEMA_NAME: optional. Specifies the schema of the table to be queried.TABLE_NAME: specifies the name of the table to be queried.QUOTA_NAME: optional. Specifies whether to use the specified quota.COMPRESSION_CODE: optional. Specifies the compression format of the data source. It supports four compression formats:ZLIB,ZSTD,LZ4, andODPS_LZ4. If not specified, compression is not enabled.
log_errors
LOG ERRORS: Enables error diagnostics during the import process, allowing failed rows to be recorded instead of terminating the entire operation at the first error. When used with theREJECT LIMITclause, it allows control over the number of rows with errors that can be accepted.INTO 'logfile_string': An optional clause used to specify the file in the target directory where error information will be written. IfINTO 'logfile_string'is not specified, error information is only recorded in thewarning buffer, which can be viewed usingshow warnings.logfile_stringindicates the directory for storing error information, with the following format:Note
The
INTO 'logfile_string'parameter is supported starting from V4.4.0.When error information is stored locally,
logfile_stringhas the format[file://] local_file_path, wherelocal_file_pathcan be a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database.secure_file_privis used to configure the file paths that OBServer nodes have permission to access.local_file_pathmust be a subpath of thesecure_file_privpath.When error information is stored remotely (refer to the Location section in the syntax for creating external tables),
logfile_stringhas the following format:{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path, where$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required for accessing Alibaba Cloud OSS, AWS S3, and object storage compatible with the S3 protocol, respectively.s3_regionis the region information selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.hdfs://localhost:port/PATH, wherelocalhostis the address of HDFS,portis the port number of HDFS, andPATHis the directory path in HDFS. For Kerberos authentication, the address is:hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx.
OceanBase Database allows setting tenant-level parameters to configure the compression algorithm for diagnostic logs and the maximum size of a single diagnostic log file. For more information, see load_data_diagnosis_log_compression and load_data_diagnosis_log_max_size.
REJECT LIMIT: An optional clause used to specify the maximum number of rows with errors allowed:- The default value is 0, which means no rows with errors are allowed, and the operation will fail at the first error.
integer: The maximum number of rows with errors allowed on a single server. For example, 10 means that up to 10 rows with errors can be encountered on a single server.UNLIMITED: Allows an unlimited number of rows with errors.
BADFILE 'badfile_string': Specifies the path for storing error data files. The format ofbadfile_stringis the same as that oflogfile_string.Note
The
BADFILE 'badfile_string'parameter is supported starting from V4.4.0.
Notice
- If the
LOG ERRORSclause is not specified, the default behavior is normal import, where the operation will fail at the first error. - If the
LOG ERRORSclause is specified but theREJECT LIMITclause is not, it is equivalent to settingLIMITto 0. The operation will fail at the first error, but the first error will be recorded, and the error code will be a diagnostic-related error, i.e., "reject limit reached".View error logs
OceanBase Database supports the following SQL statements to view the error logs of the export:
SELECT * FROM READ_ERROR_LOG('diagnosis_log_path');In this statement,
diagnosis_log_pathspecifies the path of the error log. When executed, this SQL statement is equivalent to the following URL external table statement:SELECT * FROM FILES ( LOCATION = 'diagnosis_log_path/' FORMAT( TYPE = 'csv' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '\', PARSE_HEADER = true ) [, PATTERN = 'filename'] );Here are some examples:
The specified log path is a file name (not ending with
/).SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/filename');The corresponding URL external table statement is as follows (the file name is used as a pattern to filter files):
SELECT * FROM FILES ( LOCATION = 'diagnosis/log/path/', FORMAT ( TYPE = 'csv' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '\', PARSE_HEADER = true ), PATTERN = 'filename' );The specified log path is a folder (ending with
/).SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/');The corresponding URL external table statement is as follows (no pattern is used):
SELECT * FROM FILES ( LOCATION = 'diagnosis/log/path/', FORMAT ( TYPE = 'csv' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '\', PARSE_HEADER = true ) );
Rules for using wildcards in multi-file direct load
To facilitate multi-file import, the wildcard feature is introduced for both server-side and OSS file imports, but it is not applicable for client-side file imports.
Wildcard usage on the server side
Matching rules:
Matching a file:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';Matching a directory:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';Matching both a directory and a file name:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
Considerations:
At least one matching file must exist. Otherwise, an error code 4027 is returned.
For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvis considered as a whole match. If no match is found, an error code 4027 is returned.Only POSIX-compliant wildcards supported by the GLOB function are supported, such as
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv). Although these wildcards can be found using thelscommand, the GLOB function cannot match them, resulting in an error code 4027.
Wildcard usage in Cloud Object Storage Service (OSS)
Matching rules:
Matching a file:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';Considerations:
Directory matching is not supported. For example,
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';will returnOB_NOT_SUPPORTED.Only the
*and?wildcards are supported for file names. Other wildcards, although allowed as input, cannot match any results.
Examples
Note
When you use
LOAD DATAto load data, you can use\Nto representNULL.Import data from a server-side file
Example 1: Import data from a server-side file.
Set the global secure file path.
obclient> SET GLOBAL secure_file_priv = "/" Query OK, 0 rows affected obclinet> \q ByeNote
Since
secure_file_privis aGLOBALvariable, you need to execute\qto exit the client and make the setting take effect.After reconnecting to the database, import data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected
Example 2: Enable direct load by using the
APPENDhint.LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/a.csv' INTO TABLE t;Example 3: Import a CSV file.
Import all columns from the
test1.csvfile.load data /*+ direct(true,0) parallel(2)*/ from files( location = "data/csv", format = ( type = 'csv', field_delimiter = ',', parse_header = true, skip_blank_lines = true ), pattern = 'test1.csv') into table t1;Read the
c1andc2columns from thetest1.csvfile in thedata/csvdirectory and import them to thecol1andcol2columns of thet1table.load data /*+ direct(true,0) parallel(2)*/ from ( select c1, c2 from files( location = 'data/csv' format = ( type = 'csv', field_delimiter = ',', parse_header = true, skip_blank_lines = true ), pattern = 'test1.csv')) into table t1 (col1, col2);
Example 4: Import a PARQUET file.
load data /*+ direct(true,0) parallel(2)*/ from files( location = "data/parquet", format = ( type = 'PARQUET'), pattern = 'test1.parquet') into table t1;Example 5: Import an ORC file.
load data /*+ direct(true,0) parallel(2)*/ from files( location = "data/orc", format = ( type = 'ORC'), pattern = 'test1.orc') into table t1;Example 5: Import an ODPS file.
load data /*+ direct(true,0) parallel(2)*/ from source ( type = 'ODPS', accessid = '$ODPS_ACCESSID', accesskey = '******', endpoint= '$ODPS_ENDPOINT', project_name = 'example_project', schema_name = '', table_name = 'example_table', quota_name = '', compression_code = '') into table t1;Import data from a client-side file
Example 1: Import data from a local file to a table in OceanBase Database.
Open a terminal or command prompt window and enter the following command to start the client.
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -DtestThe return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221719526 Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [test]>Notice
To use the
LOAD DATA LOCAL INFILEfeature, you must use OBClient V2.2.4 or later. If you do not have the OBClient client of the specified version, you can also use the MySQL client to connect to the database.In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Example 2: Directly import a compressed file by setting the COMPRESSION parameter.
LOAD DATA LOCAL INFILE '/your/file/lineitem.tbl.gz' INTO TABLE lineitem COMPRESSION GZIP FIELDS TERMINATED BY '|';Example 3: Specify a partition for direct load by using the PARTITION clause.
- Specify a partition for direct load
load data /*+ direct(true,0) parallel(2) load_batch_size(100) */ infile "$FILE_PATH" into table t1 partition(p0, p1) fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';- Specify a subpartition for direct load
load data /*+ direct(true,0) parallel(2) load_batch_size(100) */ infile "$FILE_PATH" into table t1 partition(p0sp0, p1sp1) fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';Import data from an OSS file
Example 1: Enable direct load by using the
direct(bool, int)hint. The direct load file is stored in OSS.load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';Import data from a server-side file as a URL external table
Notice
The IP addresses in the examples have been desensitized. When verifying, you must replace them with the actual IP addresses of your machines.
The following example describes how to import data from a file located on a server (OBServer node) or in the MySQL-compatible mode of OceanBase Database. The steps are as follows:
Create directories on the OBServer node. The
/home/admin/test_csvdirectory stores external data, the/home/admin/test_intodirectory stores error messages, and the/home/admin/test_badfiledirectory stores error data files.[admin@xxx /home/admin]# mkdir -p /home/admin/{test_csv,test_into,test_badfile}Prepare an external file. In the
/home/admin/test_csvdirectory, create a file namedtype_cast.csv.[admin@xxx /home/admin/test_csv]# vi type_cast.csvThe content of the file is as follows:
1,2,3 2,4,af 3,4,5 ds,6,32 4,5,6 5,2,3 6,v4,af 7,4,5 kj,a6,32 8,5,6Set the import file path.
Notice
For security reasons, when you set the
secure_file_privsystem variable, you can only connect to the database through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Run the following command to log in to the machine where the OBServer node is located.
ssh admin@10.10.10.1Run the following command to connect to the
mysql001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Run the following SQL statement to set the import and export to unlimited.
SET GLOBAL secure_file_priv = "/";
Reconnect to the
mysql001tenant.Here is an example:
obclient -h10.10.10.1 -P2881 -uroot@mysql001 -p****** -A -Ddb_testCreate a table named
test_tbl1.CREATE TABLE test_tbl1(col1 INT, col2 INT, col3 INT);Set the compression algorithm used for diagnostic logs to
AUTO.ALTER SYSTEM SET load_data_diagnosis_log_compression = 'AUTO';For more information about setting the compression algorithm used for diagnostic logs, see load_data_diagnosis_log_compression.
Set the maximum size of a single diagnostic log file to 1 KB. If the size of the exported log exceeds 1 KB, a second file will be automatically generated and the export will continue.
ALTER SYSTEM SET load_data_diagnosis_log_max_size = '1K';For more information about setting the size of a single diagnostic log file, see load_data_diagnosis_log_max_size.
Use the
LOAD DATAstatement to import data into thetest_tbl1table using the URL external table syntax, and specify error diagnostics. Set the directory for storing error logs to/home/admin/test_into/and the directory for storing error data files to/home/admin/test_badfile/.LOAD DATA FROM FILES( LOCATION = '/home/admin/test_csv/', FORMAT = ( TYPE = 'csv', FIELD_DELIMITER = ','), PATTERN = 'type_cast.csv') INTO TABLE test_tbl1 LOG ERRORS INTO '/home/admin/test_into/' REJECT LIMIT UNLIMITED BADFILE '/home/admin/test_badfile/';The returned result is as follows:
Query OK, 6 rows affected, 4 warnings Records: 6 Deleted: 0 Skipped: 0 Warnings: 4Use the
read_error_logstatement to view the content of the error log.SELECT * FROM READ_ERROR_LOG('/home/admin/test_into/');The returned result is as follows:
+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+ | ERROR CODE | FILE NAME | LINE NUMBER | ERROR MESSAGE | +------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+ | -4226 | type_cast.csv | 4 | fail to scan file type_cast.csv at line 4 for column "db_test"."test_tbl1"."col1", error: Incorrect integer value | | -4226 | type_cast.csv | 9 | fail to scan file type_cast.csv at line 9 for column "db_test"."test_tbl1"."col1", error: Incorrect integer value | | -4226 | type_cast.csv | 7 | fail to scan file type_cast.csv at line 7 for column "db_test"."test_tbl1"."col2", error: Incorrect integer value | | -4226 | type_cast.csv | 2 | fail to scan file type_cast.csv at line 2 for column "db_test"."test_tbl1"."col3", error: Incorrect integer value | +------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+ 4 rows in setView the data in the
test_tbl1table.SELECT * FROM test_tbl1;The returned result is as follows:
+------+------+------+ | col1 | col2 | col3 | +------+------+------+ | 1 | 2 | 3 | | 3 | 4 | 5 | | 4 | 5 | 6 | | 5 | 2 | 3 | | 7 | 4 | 5 | | 8 | 5 | 6 | +------+------+------+ 6 rows in setView the content of the bad file through the URL external table.
SELECT * FROM FILES ( LOCATION = '/home/admin/test_badfile/', FORMAT ( TYPE = 'csv', FIELD_DELIMITER = ','), PATTERN = 'data.bad');The returned result is as follows:
+------+------+------+ | c1 | c2 | c3 | +------+------+------+ | ds | 6 | 32 | | kj | a6 | 32 | | 6 | v4 | af | | 2 | 4 | af | +------+------+------+ 4 rows in set
References
- For more information about how to connect to OceanBase Database, see Overview of connection methods.
- For more information about how to use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement - For more information about how to use the
LOAD DATAstatement for direct load, see Import data by using the LOAD DATA statement for direct load