The LOAD DATA statement in OceanBase Database in MySQL mode is equivalent to the LOAD DATA statement in MySQL. The t1.csv file imported to OceanBase Database is the file exported in the previous topic. The .csv file must be placed on the OBServer node. OceanBase Database cannot load local files. This feature is still under development.
Syntax of the LOAD DATA statement
LOAD DATA
[/*+ parallel(N) load_batch_size(M)*/]
INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE tbl_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(col_name_var
[, col_name_var] ...)]
[SET col_name={expr | DEFAULT},
[, col_name={expr | DEFAULT}] ...]
Principles of the LOAD DATA statement
You can use the LOAD DATA statement to import a CSV file in the following procedure:
Parse the file: OceanBase Database reads data from the file based on the file name you entered and determines whether to perform parallel or serial parsing of data from the input file based on the degree of parallelism (DOP) you specified.
Distribute the data: OceanBase Database is a distributed database. Data of each partition may be distributed across different OBServer nodes. The LOAD DATA statement is used to process the parsed data and determine to which OBServer node the data is to be sent.
Insert the data: After the destination OBServer node receives the data, it executes the INSERT statement to insert the data into the corresponding partitions.
To improve the performance of the LOAD DATA statement, you can specify the DOP for loading data. The LOAD DATA statement can work in multiple threads during file parsing, partition computation, and data distribution.
To avoid the impact of distributed transactions on performance, the LOAD DATA statement groups data by partition and distributes data to the OBServer node for multiple write operations. It opens an independent transaction for each write operation that writes data from one group. If an error occurs during the execution of the LOAD DATA statement, you must manually delete the loaded data. If a large data file is imported, it may take a long time to insert data into each node. In this case, adjust the value of the ob_query_timeout parameter as needed.
The LOAD DATA statement provides many options to fit your needs, including:
DOP
The
/* + parallel(N)*/option specifies the DOP for data loading. The default value of N is 4. Recommended value range: [0, Maximum number of CPU cores in the tenant].load data /*+ parallel(4) */infile '/home/admin/a.csv' into table tBatch size
/ *+ load_batch_size(M)* /specifies the batch size of each insertion. The default value of M is 1000. You can adjust the value of M based on the total length of the data rows to be imported. Recommended value range: [[100, 1000]].Input file
The
INFILE 'file_name'keyword specifies the directory and name of the input file. In the current OceanBase Database version, the LOAD DATA statement supports only loading a file located on the local OBServer. Therefore, before you import a file, you need to copy the file to the OBServer where the OBServer node runs and execute the LOAD DATA statement.Index of the destination table
To improve the import efficiency, we recommend that you create a base table first and create an index for the table after data import is completed. You must create the global index after the data import is completed. Otherwise, the
not supporterror may be reported.Execution privileges
You need to perform the following steps to grant access to a file on the server:
Modify the path where the security file is located.
Set the path of the security file to null to skip security check. Then exit the session and log on again for the setting to take effect.
Run set global secure_file_priv = "".Grant the file privilege to a user.
You can run the following command to grant the file privilege to a user in MySQL mode.
`grant file on *.* to USER_NAME;`
Duplicate data processing
REPLACEspecifies to replace the original data in the table with data in the input file.IGNOREspecifies to ignore duplicate data in the input file.The LOAD DATA statement uses the primary key of a table to identify duplicate data. If the table does not have a primary key, the LOAD DATA statement cannot identify duplicate data, and the REPLACE and IGNORE options do not take effect.
If you do not specify this option, the LOAD DATA statement records the erroneous data into the log file in the case of duplicate data.
Destination table option
The
INTO TABLE tbl_namekeyword is used to specify the name of the destination table. The LOAD DATA statement supports partitioned and non-partitioned tables.Field format
The
FIELDS\|COLUMNSclause specifies the delimiter options for fields in the input file. Notes:TERMINATED BYspecifies the end character of each field.Enclosed Byspecifies a special character that encloses each field.Escaped Byspecifies a wildcard character for the imported fields.
Line format
`` The line format specifies the delimiter for each line in the input file. Notes:
Starting Byspecifies the start character of each line.Terminated Byspecifies the end character of each line.- The
IGNORE number {LINES \| ROWS}clause indicates to ignore a specifiednumberof rows from the beginning of the input file.
load data /*+ parallel(4) */infile '/home/admin/a.csv' into table t fields terminated by ',' lines terminated by '\n';
Log file
If an error occurs during the import, according to the design of the LOAD DATA statement, the INSERT statement will be rolled back, and the LOAD DATA statement generates a log file named obloaddata.log in the \log subdirectory of the installation directory of the OBServer. The following shows a sample log file.
Tenant name: mysql
File name: /home/admin/a.csv
Into table: `test`.`t`
Parallel: 1
Batch size: 1000
SQL trace: YD7A20BA65670-0005AADAAA3CAB52
Start time: 2020-07-29 21:08:13.073741
Load query:
load data infile '/home/admin/test.csv' into table t fields terminated by ',' lines terminated by '\n'
Row ErrCode ErrMsg
1 1062 Duplicated primary key
2 1062 Duplicated primary key
The log contains the basic information about the task generated by the LOAD DATA statement, including the tenant name, input file name, destination table name, DOP, LOAD DATA statement used, and error details in lines.
Example
Import the t1.csv file exported from MySQL in the preceding topic to an OceanBase database.
$ mysql -h127.1 -u****@obmysql#obdemo -P2881 -p123456 -c -A test -Ns
MySQL [test]> select * from t1;
MySQL [test]> load data infile '/tmp/t1.csv' into table t1 fields terminated by ',' enclosed by '"' lines terminated by '\n' ;
MySQL [test]> select * from t1;
1 a b a b
2 a,b "a b"
3 a\nb a \nb
4 a\\b "a\\b"
MySQL [test]>