OceanBase Database enables you to use the LOAD DATA command to import data from external text files to database tables.
Syntax for the LOAD DATA command:
LOAD DATA
[/*+ parallel(N)*/]
INFILE 'file_name'
[IGNORE]
INTO TABLE tbl_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(col_name_var
[, col_name_var] ...)]
Notice
The file to be loaded must be located on the OBServer where the leader of the table resides. The current OceanBase Database version does not support loading data from a remote client.
You can use the LOAD DATA command to import a text file in the CSV format in the following process:
Parse the file: OceanBase Database reads data from the file based on the file name you entered and determines whether to perform parallel or serial parsing of data from the input file based on the degree of parallelism you specified.
Distribute the data: OceanBase Database is a distributed database. Data of each partition may be distributed across different OBServers. The LOAD DATA command is used to calculate the parsed data and determine to which OBServer the data is sent.
Insert the data: After the destination OBServer receives the data, it executes the INSERT statement to insert the data into the corresponding partition.
The LOAD DATA command provides many options to fit your needs, including:
- Parallelism: The /*+ parallel(N)*/ specifies the parallelism for loading data. We recommend that you set this option to a value within the range of [0, Maximum number of CPUs of the tenant]. For example:
load data /*+ parallel(4) */infile '/home/admin/a.csv' into table t
Input file: The INFILE 'file_name' keyword specifies the directory and name of the input file. In the current OceanBase Database version, the LOAD DATA command only supports loading a file located on an OBServer. Therefore, before you import a file, you need to copy the file to an OBServer and connect to the OBServer to run the LOAD DATA command.
Execution permission: You need to perform the following steps to grant the access to a file on the server:
Duplicate data processing: This option allows you to specify how to process duplicate data.
Replaceindicates to replace the original data in the table with the data in the input file.Ignoreindicates to ignore duplicate data in the input file. The LOAD DATA statement identifies duplicate data based on the primary key of a table. If the table does not have a primary key, the LOAD DATA statement cannot identify duplicate data, and theReplaceandIgnoreoptions make no difference. If you do not specify this option, the LOAD DATA statement records the wrong data into the log file when duplicate data exists.
- Destination table: You can use the
INTO TABLE tbl_namekeyword to specify the name of the destination table. The LOAD DATA statement supports partitioned and non-partitioned tables.
- Field format: This option describes how to use the
Fields | Columnsclause to specify the delimiter for fields in the input file. TheTerminated Bykeyword specifies the field delimiters. TheEnclosed Bykeyword specifies whether a field is enclosed by specific characters. TheEscaped Bykeyword specifies the wildcard characters in a field.
- Line format: This option describes how to use the
Linesclause to specify the characters to enclose each row in the input file. TheStarting Bykeyword specifies the start-of-line character. TheTerminated Bykeyword specifies the end-of-line character. TheIGNORE number {LINES | ROWS}clause indicates to ignore a specifiednumberof rows from the beginning of the input file.
load data /*+ parallel(4) */infile '/home/admin/a.csv' into table t fields terminated by ',' lines terminated by '\n';
- Column mapping relationship: This option describes how to use the
(col_name_var [, col_name_var] ...)keyword to specify the relationship between the columns in the destination table and the fields in the input file. By default, fields in the input file are mapped to columns in the destination table one by one. If you use thecol_name_or_user_varkeyword to specify the mapping relationship between fields in the input file and columns in the destination table, the LOAD DATA statement maps the columns in the destination table based on the specified column name. If no column is specified, the relationship is set to NULL. Example:
load data infile '/home/admin/a.csv' into table t (id, names) set dates=CURRENT_TIMESTAMP;
If the input file does not contain all the columns, the missing columns are filled based on the following rules:
If you need to add a null value, use '\N' in the input file.
- Log file: If an error occurs during the import, the INSERT statement is rolled back, and the LOAD DATA statement generates a log file named
obloaddata.log.<XXXXXX>in the \log subdirectory of the installation directory of the OBServer. Example:
Tenant name: mysql
File name: /home/admin/a.csv
Into table: `test`.`t`
Parallel: 1
Batch size: 1000
SQL trace: YD7A20BA65670-0005AADAAA3C****
Start time: 2020-07-29 21:08:13.073741
Load query:
load data infile '/home/admin/test.csv' into table t fields terminated by ',' lines terminated by '\n'
Row ErrCode ErrMsg
1 1062 Duplicated primary key
2 1062 Duplicated primary key
The log contains the basic information about the tasks generated by the LOAD DATA statement, including the tenant name, input file name, destination table name, parallelism, LOAD DATA command used, and error details in lines.
Example: Import a .csv file to the ware2 table by using the LOAD DATA command
[admin@h07g12092.sqa.eu95 /home/admin/csvdata]
$more ware__df8f30ac_64e0_474c_9cc4_9919d64c5e4c
2,1200,.0862,L6xwRsbDk,xEdT1jkENtbLwoI1Zb0,NT0j4RCQ4OqrS,vlwzndw2FPrO,XR,063311111
1,1200,.1868,n1P4zYo8OH,jTNkXKWXOdh,lf9QXTXXGoF04IZBkCP7,srRq15uvxe5,GQ,506811111
$obclient -h10.0.0.0 -utpcc@obbmsql -P2881 -p**1***
obclient> load data infile '/home/admin/csvdata/ware__df8f30ac_64e0_474c_9cc4_9919d64c5e4c' into table ware2 fields terminated by ',' lines terminated by '\n';
Query OK, 2 rows affected (0.02 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
obclient> select * from ware2\G
*************************** 1. row ***************************
W_ID: 2
W_YTD: 1200
W_TAX: .0862
W_NAME: L6xwRsbDk
W_STREET_1: xEdT1jkENtbLwoI1Zb0
W_STREET_2: NT0j4RCQ4OqrS
W_CITY: vlwzndw2FPrO
W_STATE: XR
W_ZIP: 063311111
*************************** 2. row ***************************
W_ID: 1
W_YTD: 1200
W_TAX: .1868
W_NAME: n1P4zYo8OH
W_STREET_1: jTNkXKWXOdh
W_STREET_2: lf9QXTXXGoF04IZBkCP7
W_CITY: srRq15uvxe5
W_STATE: GQ
W_ZIP: 506811111
2 rows in set (0.00 sec)