OceanBase Database allows you to use the LOAD DATA statement to import data from external files to database tables.
Scenarios
You can use the LOAD DATA statement to import a CSV text file in the following process:
Notice
The CSV file must be located on the current OBServer node. The current OceanBase Database version does not support loading data from a remote client.
Parse the file.
OceanBase Database reads data from a file based on the file name that you entered and determines whether to perform parallel or serial parsing of data from the input file based on the degree of parallelism (DOP) that you specified.
Distribute data.
OceanBase Database is a distributed database system. Data of each partition may be distributed across different OBServer nodes. The LOAD DATA statement is used to process the parsed data and determine the OBServer node to which the data is sent.
Insert data.
After the destination OBServer node receives the data, it executes the INSERT statement to insert the data into the corresponding partition.
LOAD DATA
Obtain the privilege to execute the LOAD DATA statement
You need to perform the following steps to grant the access to a file on the server:
Set the directory where the security file resides to the directory where the file is located or to null (indicating that no check is required).
Notice
For security reasons, the following SQL statement can only be executed locally but cannot be executed on a remote OBClient. For more information, see secure_file_priv.
SET GLOBAL SECURE_FILE_PRIV = "";Grant privileges to the user.
Execute the following statement to grant the FILE privilege to the user specified by
USER_NAME, which is the user who needs to execute the LOAD DATA statement.GRANT FILE ON *.* TO user_name;
Syntax of LOAD DATA
LOAD DATA
[/*+ parallel(N)*/]
INFILE 'file_name'
[REPLACE | IGNORE] // The REPLACE and IGNORE options are applicable to MySQL tenants only.
INTO TABLE tbl_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']+
]
[IGNORE number {LINES | ROWS}]
[(col_name_var
[, col_name_var] ...)]
| Parameter | Required | Description | Example |
|---|---|---|---|
| [/*+ parallel(N)*/] | No | The DOP for loading data. We recommend that you set this option to a value within the range of [0, Maximum number of CPU cores of the tenant]. | /*+ parallel(4) */ |
| INFILE 'file_name' | Yes | The path and file name of the input file. | INFILE '/home/admin/a.csv' |
| [REPLACE | IGNORE] | No | Specifies how to process duplicate data. The LOAD DATA statement uses the primary key of a table to identify duplicate data. If the table does not have a primary key, the LOAD DATA statement cannot identify duplicate data, and the REPLACE and IGNORE options have the same effect.
|
REPLACE |
| INTO TABLE tbl_name | Yes | The target table name. | INTO TABLE tbl_name |
| [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] | No | Specifies the field separator in the input file.
|
FIELDS TERMINATED BY ',' |
| [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ] | No | Specifies the characters to enclose each line in the input file in the Lines clause.
|
LINES TERMINATED BY '\n' |
| [IGNORE number {LINES | ROWS}] | No | Specifies to ignore the first N rows in the input file, where N is specified by the number parameter. |
IGNORE 1 LINES |
| [(col_name_var [, col_name_var] ...) ] | No | Specifies the relationship between the columns in the target table and the fields in the input file. By default, fields in the input file are mapped to columns in the destination table in a one-to-one correspondence. If you use col_name_or_user_var to specify the mapping relationship between fields in the input file and columns in the destination table, the LOAD DATA statement maps the columns in the destination table based on the specified column names. Columns that are not specified are set to NULL. If the input file does not contain all the columns, the missing columns are filled based on the following mappings: character data type: empty string; numeric data type: 0; date data type: 0000-00-00. To add a null value, use "\N" in the input file. |
(id, names) |
Log files
If an error occurs during the import, the INSERT statement is rolled back, and the LOAD DATA statement generates a log file named obloaddata.log <XXXXXX> in the \log subdirectory of the installation directory of the observer process. The following example shows the content of a log file. The log contains the basic information about the task generated by the LOAD DATA statement, including the tenant name, input file name, destination table name, DOP, LOAD DATA command used, and error details in lines.
Tenant name: mysql
File name: /home/admin/a.csv
Into table: `test`.`t`
Parallel: 1
Batch size: 1000
SQL trace: YD7A20BA65670-0005AADAAA3C****
Start time: 2020-07-29 21:08:13.073741
Load query:
load data infile '/home/admin/test.csv' into table t fields terminated by ',' lines terminated by '\n'
Row ErrCode ErrMsg
1 1062 Duplicated primary key
2 1062 Duplicated primary key
Examples
[root@***** /home/admin]
$cat student.sql
1,"lin",98
2,"hei",90
3,"ali",95
[root@***** /home/admin]
$obclient -h10.0.0.0 -uroot@MySQL -P2881 -p******
obclient> create table student_1 (ID int primary key,name varchar(128),score int);
obclient> LOAD DATA INFILE '/home/admin/student.sql' INTO TABLE student_1
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(ID,name,score);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
obclient> select * from student_1
+----+------+-------+
| ID | name | score |
+----+------+-------+
| 1 | lin | 98 |
| 2 | hei | 90 |
| 3 | ali | 95 |
+----+------+-------+
3 rows in set (0.00 sec)