Import data by using the LOAD DATA statement|V4.1.0| docs|Distributed Database

Import data by using the LOAD DATA statement

Last Updated：2023-08-28 09:28:41 Updated

OceanBase Database allows you to use the LOAD DATA statement to import data from external files to database tables.

Scenarios

You can use the LOAD DATA statement to import a CSV text file in the following process:

Notice

The CSV file must be located on the current OBServer node. The current OceanBase Database version does not support loading data from a remote client.

Parse the file.

OceanBase Database reads data from a file based on the file name that you entered and determines whether to perform parallel or serial parsing of data from the input file based on the degree of parallelism (DOP) that you specified.
Distribute data.

OceanBase Database is a distributed database system. Data of each partition may be distributed across different OBServer nodes. The LOAD DATA statement is used to process the parsed data and determine the OBServer node to which the data is sent.
Insert data.

After the destination OBServer node receives the data, it executes the INSERT statement to insert the data into the corresponding partition.

LOAD DATA

Obtain the privilege to execute the LOAD DATA statement

You need to perform the following steps to grant the access to a file on the server:

Set the directory where the security file resides to the directory where the file is located or to null (indicating that no check is required).

Notice

For security reasons, the following SQL statement can only be executed locally but cannot be executed on a remote OBClient. For more information, see secure_file_priv.
```
SET GLOBAL SECURE_FILE_PRIV = "";
```
Grant privileges to the user.

Execute the following statement to grant the FILE privilege to the user specified by USER_NAME, which is the user who needs to execute the LOAD DATA statement.
```
GRANT FILE ON *.* TO user_name;
```

Syntax of LOAD DATA

LOAD DATA
    [/*+ parallel(N)*/]
    INFILE 'file_name'
    [REPLACE | IGNORE] // The REPLACE and IGNORE options are applicable to MySQL tenants only.
    INTO TABLE tbl_name
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']+
    ]
    [IGNORE number {LINES | ROWS}]
    [(col_name_var
        [, col_name_var] ...)]

Parameter	Required	Description	Example
[/+ parallel(N)/]	No	The DOP for loading data. We recommend that you set this option to a value within the range of [0, Maximum number of CPU cores of the tenant].	/+ parallel(4) /
INFILE 'file_name'	Yes	The path and file name of the input file.	INFILE '/home/admin/a.csv'
[REPLACE \| IGNORE]	No	Specifies how to process duplicate data. The `LOAD DATA` statement uses the primary key of a table to identify duplicate data. If the table does not have a primary key, the `LOAD DATA` statement cannot identify duplicate data, and the `REPLACE` and `IGNORE` options have the same effect. `REPLACE`: specifies to replace the original data in the table with data in the input file. `IGNORE`: specifies to ignore duplicate data in the input file. Note If you do not specify this option, the `LOAD DATA` statement records the wrong data into the log file when duplicate data exists.	REPLACE
INTO TABLE tbl_name	Yes	The target table name.	INTO TABLE tbl_name
[{FIELDS \| COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ]	No	Specifies the field separator in the input file. `TERMINATED BY`: specifies the field separator. `ENCLOSED BY`: specifies whether the start and end of each field contain specific characters. `ESCAPED BY`: specifies the wildcard in fields. Note If this clause is not specified, the default value is `TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\'`.	FIELDS TERMINATED BY ','
[LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ]	No	Specifies the characters to enclose each line in the input file in the Lines clause. `STARTING BY`: specifies the character to start each line. `TERMINATED BY`: specifies the character to end each line. Note If this clause not specified, the default value is `STARTING BY '' TERMINATED BY '\n'`.	LINES TERMINATED BY '\n'
[IGNORE number {LINES \| ROWS}]	No	Specifies to ignore the first N rows in the input file, where N is specified by the `number` parameter.	IGNORE 1 LINES
[(col_name_var [, col_name_var] ...) ]	No	Specifies the relationship between the columns in the target table and the fields in the input file. By default, fields in the input file are mapped to columns in the destination table in a one-to-one correspondence. If you use `col_name_or_user_var` to specify the mapping relationship between fields in the input file and columns in the destination table, the LOAD DATA statement maps the columns in the destination table based on the specified column names. Columns that are not specified are set to NULL. If the input file does not contain all the columns, the missing columns are filled based on the following mappings: character data type: empty string; numeric data type: 0; date data type: 0000-00-00. To add a null value, use "\N" in the input file.	(id, names)

Log files

If an error occurs during the import, the INSERT statement is rolled back, and the LOAD DATA statement generates a log file named obloaddata.log <XXXXXX> in the \log subdirectory of the installation directory of the observer process. The following example shows the content of a log file. The log contains the basic information about the task generated by the LOAD DATA statement, including the tenant name, input file name, destination table name, DOP, LOAD DATA command used, and error details in lines.

Tenant name:    mysql
File name:  /home/admin/a.csv
Into table: `test`.`t`
Parallel:   1
Batch size: 1000
SQL trace:  YD7A20BA65670-0005AADAAA3C****
Start time: 2020-07-29 21:08:13.073741
Load query:
load data infile '/home/admin/test.csv' into table t fields terminated by ',' lines terminated by '\n'
Row ErrCode ErrMsg
1   1062    Duplicated primary key
2   1062    Duplicated primary key

Examples

[root@***** /home/admin]
$cat student.sql
1,"lin",98
2,"hei",90
3,"ali",95

[root@***** /home/admin]
$obclient -h10.0.0.0 -uroot@MySQL -P2881 -p******

obclient> create table student_1 (ID int primary key,name varchar(128),score int);

obclient>  LOAD DATA INFILE '/home/admin/student.sql' INTO TABLE student_1
             FIELDS TERMINATED BY ',' ENCLOSED BY '"'  
             LINES TERMINATED BY '\n'
             (ID,name,score);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

obclient> select * from student_1
+----+------+-------+
| ID | name | score |
+----+------+-------+
|  1 | lin  |    98 |
|  2 | hei  |    90 |
|  3 | ali  |    95 |
+----+------+-------+
3 rows in set (0.00 sec)

Import data by using the LOAD DATA statement

Scenarios

Notice

LOAD DATA

Obtain the privilege to execute the LOAD DATA statement

Notice

Syntax of LOAD DATA

Note

Note

Note

Log files

Examples