OBLOADER overview|V4.2.4|OBLOADER&OBDUMPER| docs|Distributed Database

OBLOADER overview

Last Updated：2023-08-14 08:41:50 Updated

Features

OBLOADER mainly provides the following features:

Allows you to import database object definitions and table data from local disks, Apache Hadoop, Alibaba Cloud Object Storage Service (OSS), and Amazon Simple Storage Service (S3).
Allows you to import SQL files exported by mysqldump.
Allows you to import data files in the standard CSV, INSERT SQL, ORC, or Parquet format.
Allows you to set data preprocessing rules and configure field mappings between files and tables.
Supports features such as import throttling, memory exhaustion prevention, resumption after an interruption, and automatic retries.
Allows you to specify a custom log directory to store bad data and conflicting data during import.
Automatically splits large files without consuming additional storage space.
Supports encryption of sensitive parameters specified in commands, such as the database account and password and cloud storage account and password.

Considerations

For more information about the standard CSV format, see the RFC 4180 specifications. We recommend that you import data in strict accordance with the RFC 4180 specifications.
To improve performance, you can modify the Java virtual machine (JVM) memory parameter in the script when you try to import a large amount of data.
The object names, data file names, and rule file names specified by command-line options must be capitalized in the same way. By default, uppercase letters are used in Oracle mode, and lowercase letters are used in MySQL mode. If table names are case-sensitive, enclose them in brackets ([ ]). For example, --table '[test]' indicates the table named test, and the file name is in the format of test.group.sequence.suffix. --table '[TEST]' indicates the table named TEST, and the file name is in the format of TEST.group.sequence.suffix.
All imported data files are named in the table.group.sequence.suffix format.
If object dependency exists in the database, object definitions and data may not be imported in strict dependency order.
When you resolve the primary key conflict in OceanBase Database V1.4.79 in MySQL mode, the use of the INSERT ... WHERE NOT EXISTS statement may result in cross-partition insertion errors.
In OceanBase Database V1.4.x in MySQL mode, the metadata of the RANGE COLUMNS-KEY composite partitioned table is defective in the virtual routing view.
Tables without primary keys do not support import resumption after an interruption or data substitution.
When you specify the --cut option on the OBLOADER command line, do not use the --trail-delimiter option if no field separator or separator string exists at the end of the data line in the file. Otherwise, data cannot be correctly imported to the database.
Before you use OBLOADER to import data to OceanBase Database V3.2.4 or later, set the system parameter open_cursors to a large value. Otherwise, an error may occur during the import. After the data is imported, reset the system parameter to the initial value, for example, ALTER SYSTEM SET open_cursors = 65535;.
When you import DDL statements, use the --mix option instead of the --ddl option if -f is set to a non-standard directory structure (a directory structure not generated by OBDUMPER). When the --sql option is specified, for data in the file data format, ensure that one statement inserts only one record. Otherwise, specify --mix instead of --sql to import the data.
OBLOADER supports the following file formats:
- DDL: A file in the DDL format contains only DDL statements but no table data.
- CSV: A file in the standard CSV format contains content that complies with the RFC 4180 specification.
- SQL: A file in the SQL format contains only INSERT SQL statements. Each statement occupies one line without line breaks.
- ORC: A file in the ORC format contains standard Apache ORC content. The default compression algorithm is zstd.
- Parquet: A file in the Parquet format contains standard Apache Parquet content. The default compression algorithm is zstd.
- MIX: A file in the MIX format contains mixed types of standard SQL statements, such as DDL and DML statements.
- POS: A file in the POS format contains data at a fixed byte length. Currently, the length cannot be specified.
- CUT: A file in the CUT format contains data separated by strings. However, in a file in the CSV format, data is separated by single character.

Community Edition

Enterprise Edition

OBLOADER overview

Features

Considerations