OceanBase data loading methods |V2.2.77|OceanBase Database| docs|Distributed Database

OceanBase data loading methods

Last Updated：2023-08-18 09:26:34 Updated

OceanBase enables you to use the LOAD DATA command to import data from external text files to database tables.

Syntax for the LOAD DATA command:

LOAD DATA
        [/*+ parallel(N)*/]
    INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE tbl_name
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [(col_name_var
        [, col_name_var] ...)]

Where, the REPLACE option applies only to MySQL tenants. Notice

The file to be loaded must be located on the OBServer where the leader of the table resides. The current version does not support loading data from a remote client.

You can use the LOAD DATA command to import a CSV file. Procedure:

Parse the file: OceanBase Database reads data from a file based on the file name that you enter and determines whether to perform parallel or serial parsing of data from the input file based on the degree of parallelism you specify.
Distribute the data: OceanBase Database is a distributed database. Data of each partition may be distributed across different OBServers. The LOAD DATA command is used to process the parsed data and determine to which Observer the data is sent.
Insert the data: After the destination OBServer receives the data, it executes the INSERT statement to insert the data into the corresponding partition.

The LOAD DATA command provides many options to fit your needs, including:

Degree of parallelism: The /* + parallel(N)*/ option specifies the degree of parallelism for data loading. The recommended value range is 0 to the maximum number of CPU cores in a tenant. Example:

load data /*+ parallel(4) */infile '/home/admin/a.csv' into table t

Input file: The INFILE 'file_name' keyword specifies the directory and name of the input file. In OceanBase Database V2.2.76, the LOAD DATA command only supports loading a file located on an OBServer. Therefore, you need to copy the file to an OBServer before importing it and connect to this OBServer to run the LOAD DATA statement.

Execution permission: You need to perform the following steps to grant the permission to access a file on the server to a user:
Duplicate data processing: This section specifies how to process duplicate data. The REPLACE option is used to replace the original data in the table with data in the input file. The IGNORE option is used to ignore duplicate data in the input file. The LOAD DATA statement uses the primary key of a table to identify duplicate data. If the table does not have a primary key, the LOAD DATA statement cannot identify duplicate data, and the REPLACE and IGNORE options make no difference. If you do not specify these options, the LOAD DATA statement records the wrong data into the log file when duplicate data exists.

Destination table: The INTO TABLE tbl_name keyword is used to specify the name of the destination table. The LOAD DATA statement supports partitioned and non-partitioned tables.

Field format: This section describes how to use the Fields | Columns clause to specify the delimiter for fields in the input file. The TERMINATED BY keyword specifies the field delimiters. The ENCLOSED BY keyword specifies whether a field is enclosed by specific characters. The ESCAPED BY keyword specifies the wildcard characters in a field.

Line format: This section describes how to use the LINES clause to specify the characters to enclose each line in the input file. The STARTING BY option is used to specify the start-of-line character. The TERMINATED BY option is used to specify the end-of-line character. The IGNORE number {LINES | ROWS} clause ignores a specified number of rows from the beginning of the input file.

load data /*+ parallel(4) */infile '/home/admin/a.csv' into table t fields terminated by ',' lines terminated by '\n';

Column mapping relationship: This section describes how to use the (col_name_var [, col_name_var] ...) keyword to specify the relationship between the columns in the destination table and the fields in the input file. By default, fields in the input file are mapped to columns in the destination table one by one. If you use the col_name_or_user_var keyword to specify the mapping relationship between fields in the input file and columns in the destination table, the LOAD DATA statement maps the columns in the destination table based on the specified column name. If no column is specified, the relationship is set NULL. Example:

load data infile '/home/admin/a.csv' into table t (id, names) set dates=CURRENT_TIMESTAMP;

If the input file does not contain all the columns, the missing columns are filled based on the following rules:

To add a null value, use "\N" in the input file.

Log file: If an error occurs during the import, the INSERT statement is rolled back, and the LOAD DATA statement generates a log file named obloaddata.log <XXXXXX> in the \log subdirectory of the installation directory of the OBServer.Example:

Tenant name:    mysql
File name:  /home/admin/a.csv
Into table: `test`.`t`
Parallel:   1
Batch size: 1000
SQL trace:  YD7A20BA65670-0005AADAAA3C****
Start time: 2020-07-29 21:08:13.073741
Load query:
load data infile '/home/admin/test.csv' into table t fields terminated by ',' lines terminated by '\n'
Row ErrCode ErrMsg
1   1062    Duplicated primary key
2   1062    Duplicated primary key

The log contains the basic information about the tasks generated by the LOAD DATA command, including the tenant name, input file name, destination table name, degree of parallelism, LOAD DATA command used, and error details in lines.

Example: Using the LOAD DATA statement to import a CSV file to the ware2 table

[admin@h07g12092.sqa.eu95 /home/admin/csvdata]
$more ware__df8f30ac_64e0_474c_9cc4_9919d64c5e4c
2,1200,.0862,L6xwRsbDk,xEdT1jkENtbLwoI1Zb0,NT0j4RCQ4OqrS,vlwzndw2FPrO,XR,063311111
1,1200,.1868,n1P4zYo8OH,jTNkXKWXOdh,lf9QXTXXGoF04IZBkCP7,srRq15uvxe5,GQ,506811111

$obclient  -h10.0.0.0 -utpcc@obbmsql -P2881 -p**1***

obclient> load data infile '/home/admin/csvdata/ware__df8f30ac_64e0_474c_9cc4_9919d64c5e4c' into table ware2 fields terminated by ',' lines terminated by '\n';
Query OK, 2 rows affected (0.02 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

obclient> select * from ware2\G
*************************** 1. row ***************************
      W_ID: 2
     W_YTD: 1200
     W_TAX: .0862
    W_NAME: L6xwRsbDk
W_STREET_1: xEdT1jkENtbLwoI1Zb0
W_STREET_2: NT0j4RCQ4OqrS
    W_CITY: vlwzndw2FPrO
   W_STATE: XR
     W_ZIP: 063311111
*************************** 2. row ***************************
      W_ID: 1
     W_YTD: 1200
     W_TAX: .1868
    W_NAME: n1P4zYo8OH
W_STREET_1: jTNkXKWXOdh
W_STREET_2: lf9QXTXXGoF04IZBkCP7
    W_CITY: srRq15uvxe5
   W_STATE: GQ
     W_ZIP: 506811111
2 rows in set (0.00 sec)
obclient>