OBLOADER uses command-line options to specify the information required for import. For more information about options and examples, see Options and Examples.
File matching rules
When you import data, OBLOADER searches for files that match the specified command-line parameters. The default rule for OBLOADER to search for data files to import is ${table_name}.${seq}.${file_type}, where ${seq} is optional. For example, if you specify --csv --table 'table1', OBLOADER will search for files that match the rule table1.${seq}.csv recursively.
You can also use wildcards to specify the files to be imported. For example, if you specify --csv --table '*', OBLOADER will search for all CSV files and split the filenames according to the ${table_name}.${seq}.${file_type} pattern to extract the table names, thereby enabling multi-table import.
If the filenames of the files to be imported are unrelated to the table names, you can use the --file-regular-expression option to specify the matching rule. For example, if the filenames are part-00123-bd461e82-c911-48ce-bfa0-ec1****.c000.snappy.orc, you can use the --orc --table 'table2' --file-regular-expression '.*.orc' command to match all files that end with '.orc' and import them to the specified table named table2.
OBLOADER V4.3.2 and later versions support importing multiple tables in a database by specifying matching rules. You can use the group names (including schema and table) to specify the database name and table names. If only the schema group name is specified, the --table option must specify a single table or a database with only one table, otherwise OBLOADER will report an error and exit. The following rules are supported.
${schema}.${table}.csv: ^(?<schema>[^.]+)(\.)(?<table>[^.]+)(\.)csv$
${table}.csv: ^(?<table>[^.]+)(\.)csv$
${schema}.${table}.${seq}.csv: ^(?<schema>[^.]+)(\.)(?<table>[^.]+)(\.)([^.]+\.)csv$
OBLOADER locates the files to be imported based on the correspondence between the table names and the filenames. For example, if the target table name is t1 and the specified format is --csv. In this case, the default file suffix is .csv. You can use the --file-suffix option to specify a custom file suffix. OBLOADER will traverse the path specified by -f. Files that match the table binding relationship will include t1.csv, t1.0.csv, and t1.1.2.csv.
If the filenames of the files to be imported follow other rules or no naming rules, you can use the --file-regular-expression option to specify a regular expression to match the files to be imported. For example, --file-regular-expression=".*\.csv" matches all filenames that end with .csv. --file-regular-expression=".*" matches all files in the specified directory.
Option introduction
Option style
OBLOADER provides parameters in the Unix and GNU styles.
In the Unix style, a single hyphen (-) is used before the parameter, and the option is a single character. For example,
ps -e. In this style, you can omit the space between the option and parameter. For example,-p******.In the GNU style, a double hyphen (--) is used before the parameter, and the option is a single character or a string. For example,
ps --version. In this style, a space must be placed between the option and parameter. For example,--table 'test'.
Option categories
OBLOADER options are classified into basic options and advanced options.
Basic options: These are commonly used OBLOADER options, including connection options, feature options, and other options.
Advanced options: These include feature options, performance options, and other options.
Required options
To use OBLOADER to import data, you must specify the connection options, format options, database object type options, and storage path options.
Example:
$./obloader -h xx.x.x.x -P 2883 -u test@mysql#cluster_a -p ****** -D USERA --csv --table '*' -f /output
Here, -h, -P, -u, -p, and -D are connection options; --csv is a file format option; --table is a database object type option; and -f specifies the storage path.
Basic options
Connection options
OBLOADER needs to connect to OceanBase Database to read and write data. You can specify the following options to connect to OceanBase Database.
Import types
| Import type | Related command-line options |
|---|---|
| OceanBase Database |
|
| OceanBase Cloud | --public-cloud Specifies that the database objects or table data are imported from an OceanBase cluster deployed by using OceanBase Cloud. If this option is specified in the command line, you do not need to specify the tenant name and cluster name connection option ( -u <user>@<tenant>#<cluster>), and the --no-sys option is enabled by default. For more information about the --no-sys option, see the description of the option. The use of the --public-cloud or --no-sys option may affect the features, performance, and stability of the import. OceanBase Database V2.2.30 and later versions support the server throttling feature. To ensure the stability of data import when you use the --public-cloud or --no-sys option, you can run the following command to set the server throttling threshold:alter system set freeze_trigger_percentage=50;alter system set minor_merge_concurrence=64;alter system set writing_throttling_trigger_percentage=80 tenant='xxx'; |
| ODP (Sharding) logical database |
|
Direct load
OBLOADER V4.2.6 and later support the direct load mode. This feature supports OceanBase Database V4.2.1.7 and later in the V4.2.1.x series, V4.2.4.x to V4.3.0.x, and V4.3.0.1 and later in the V4.3.x series.
--rpc-port= rpc_port_num
Specifies the RPC port of the OBServer node. This option is used in conjunction with
--directand--parallelto connect to the RPC port of the OBServer node for data import in the direct load mode.Note
- This option applies only to OceanBase Database V4.2.0 RC2 and later.
- You can obtain the OBServer RPC port in the following ways:
- When connecting to ODP:
- In the cloud-based OceanBase environment, the default ODP RPC port is 3307.
- In the OceanBase Database environment, the default port is 2885. If you want to specify a custom port, you can use the
-soption when starting ODP.
- When connecting to OBServer, you can query the DBA_OB_SERVERS view in the sys tenant to obtain the RPC port of OBServer. The default port is 2882.
- When connecting to ODP:
--direct
Enables the direct load mode. This option is used in conjunction with
--rpc-portand--parallel.--directuses different direct load modes depending on whether the target table contains data.When the target table is empty,
--directuses the full direct load mode. It can also be used with--replace-data, which specifies that data conflicts should be resolved byreplace.When the target table is not empty,
--directuses the incremental direct load mode. If the target table has no indexes or LOB fields,--replace-datacan be used. If the target table has indexes or LOB fields,--replace-datacannot be used.
Note
- The binary data types are not supported in the OBLOADER direct load mode.
- The OBLOADER direct load mode supports connecting to OBServer and ODP. The versions must meet the following requirements:
- When connecting to OBServer, OBServer must be V4.2.1.7 or later of the V4.2.1.x series, or between V4.2.4.0 and V4.3.0, or V4.3.0.1 or later of the V4.3.x series.
- When connecting to ODP, ODP must be V4.3.0 or later, and OBServer must be V4.2.1 or later.
--parallel= parallel_num
Specifies the degree of parallelism for loading data in the direct load mode. This option is used in conjunction with
--rpc-portand--direct.
Feature options
File formats
Format (command-line option) Related command-line options Scenarios --csv
Specifies to import a CSV file.- --skip-header
Specifies whether to skip the header row of a CSV file. The value is BOOLEAN. - --column-delimiter 'column_delimiter_char'
Specifies the string delimiter. - --line-separator 'line_separator_string'
Specifies the line separator. - --escape-character 'escape_char'
Specifies the escape character. - --column-separator 'column_separator_char'
Specifies the column separator in the CSV format. The default value is a comma (,). - --with-trim
Deletes the left- and right-side spaces. - --ignore-unhex
Indicates whether to ignore the decoding of hexadecimal strings. This option applies only to binary data types. - --character-set 'character_set_string'
Specifies the character set for creating a database connection. - --null-string 'null_replacer_string'
Replaces a specified string with NULL. - --empty-string 'empty_replacer_string'
Replaces a specified string with an empty string (' '). The default value is \E. - --file-suffix 'suffix_name'
Specifies the file suffix name when importing data. - --file-encoding 'encode_name'
Specifies the file encoding used to read the data file, which is not the same as the database encoding.
Applicable to almost all business scenarios. --cut
Specifies to import a CUT file.- --skip-header
Specifies whether to skip the header row of a CUT file. The value is BOOLEAN. - --skip-footer
Specifies whether to skip the last row of a CUT file. - --column-splitter 'split_string'
Specifies the column separator in the CUT format. - --trail-delimiter
Indicates whether to add a delimiter at the end of each data row. - --line-separator 'line_separator_string'
Specifies the line separator. - --with-trim
Deletes the left- and right-side spaces. - --ignore-escape
Specifies whether to ignore the escape characters when importing a CUT file. The default value is false. - --ignore-unhex
Indicates whether to ignore the decoding of hexadecimal strings. This option applies only to binary data types. - --character-set 'character_set_string'
Specifies the character set for creating a database connection. - --null-string 'null_replacer_string'
Replaces a specified string with NULL. - --escape-character 'escape_char'
Specifies the escape character. - --empty-string 'empty_replacer_string'
Replaces a specified string with an empty string (' '). The default value is \E. - --file-suffix 'suffix_name'
Specifies the file suffix name when importing data. - --file-encoding 'encode_name'
Specifies the file encoding used to read the data file, which is not the same as the database encoding.
Similar to CSV.
Often used to import flexible data exchange formats exported by some older platforms.--pos
Specifies to import a POS file.Must be used with a control file. For interactions with data unload platforms that only support fixed-length formats. --sql
Specifies to import an SQL file.- --line-separator 'line_separator_string'
Specifies the line separator. - --ignore-unhex
Indicates whether to ignore the decoding of hexadecimal strings. This option applies only to binary data types. - --character-set 'character_set_string'
Specifies the character set for creating a database connection. - --null-string 'null_replacer_string'
Replaces a specified string with NULL. - --file-suffix 'suffix_name'
Specifies the file suffix name when importing data. - --file-encoding 'encode_name'
Specifies the file encoding used to read the data file, which is not the same as the database encoding.
Applicable to scenarios where only data is imported, not schemas. Supports only one SQL syntax but offers better performance. --mix
Specifies to import a MIX file.- --character-set 'character_set_string'
Specifies the character set for creating a database connection. - --file-suffix 'suffix_name'
Specifies the file suffix name when importing data. - --file-encoding 'encode_name'
Specifies the file encoding used to read the data file, which is not the same as the database encoding. - --compat-mode
Performs compatible table schema definition imports for MySQL.
Applicable to SQL files of any formats. Offers better compatibility but significantly lower performance compared to --sql.
The current implementation reads all text into memory before parsing, making it unsuitable for handling large files.--par
Specifies to import a Parquet file.- --character-set 'character_set_string'
Specifies the character set for creating a database connection. - --ignore-unhex
Indicates whether to ignore the decoding of hexadecimal strings. This option applies only to binary data types. - --file-suffix 'suffix_name'
Specifies the file suffix name when importing data. - --file-encoding 'encode_name'
Specifies the file encoding used to read the data file, which is not the same as the database encoding.
Data migration from Hive, Spark, or other standard Parquet data sources. --orc
Specifies to import an ORC file.- --character-set 'character_set_string'
Specifies the character set for creating a database connection. - --ignore-unhex
Indicates whether to ignore the decoding of hexadecimal strings. This option applies only to binary data types. - --file-suffix 'suffix_name'
Specifies the file suffix name when importing data. - --file-encoding 'encode_name'
Specifies the file encoding used to read the data file, which is not the same as the database encoding.
Data migration from Hive, Spark, or other standard ORC data sources. --avro
Specifies to import an Avro file.--file-suffix 'suffix_name'
Specifies the file suffix name when importing data.Often used in big data scenarios, supporting AP business offline migration and other scenarios. Note
For more information about command-line options, see the Options section in this topic.
Import a file compressed and exported by using OBDUMPER
OBLOADER V4.3.0 and later support the import of files compressed and exported by using OBDUMPER. Supported importable file formats include CSV, CUT, POS, and SQL. You can configure related settings through the following command-line options.
--compress
A boolean value that indicates whether the file is a compressed file exported by using OBDUMPER.
--compression-algo
A string that indicates the compression algorithm used when a file is exported by using OBDUMPER. Valid values: zstd, zlib, gzip, and snappy. Default value: zstd.
--compression-level
An integer that indicates the compression level used when a file is exported by using OBDUMPER. The supported compression levels vary for different compression formats, as described below:
zstd: 1 ~ 22. Default value: 3.
zlib: -1 ~ 9. Default value: -1.
gzip and snappy: not supported.
Database object types
--all
Specifies to import all supported database objects and table data. When used with the --ddl option, it specifies to import all database object definitions. When used with any data format option such as --csv, --sql, --cut, or --pos, it specifies to import all data files in the corresponding format. To import all database object definitions and table data, you can use the --all, --ddl, and any data format option.
Note
The --all option is mutually exclusive with any database object options. You cannot specify both. If you specify both the --all option and any database object option, the --all option will take precedence.
--table-group 'table_group_name [,table_group_name...]' | --table-group '*
Specifies to import table group definitions. It does not support data import. For other descriptions, refer to the
--tableoption.--table 'table_name [,table_name...]' | --table '*
Specifies to import table definitions or data. When used with the --ddl option, it specifies to import only table definitions. When used with any data format option, it specifies to import only table data. To import multiple tables, separate the table names with commas (,). By default, table names imported to an Oracle compatible tenant of OceanBase Database are in uppercase, and those imported to a MySQL compatible tenant of OceanBase Database are in lowercase. For example, in the Oracle compatible mode of OceanBase Database,
--table 'test'and--table 'TEST'both refer to the TEST table. In the MySQL compatible mode of OceanBase Database,--table 'test'and--table 'TEST'both refer to the test table. If you want to distinguish between uppercase and lowercase, enclose the table name in square brackets ([ ]). For example,--table '[test]'refers to the test table, and--table '[TEST]'refers to the TEST table. If a star (*) is specified, it specifies to import all table definitions or data.Note
When a control file is used, if the table name specified by the --table option is in a different case from that in the database, the control file will not take effect.
--view 'view_name [, view_name...]' | --view '*
Specifies to import view definitions. It does not support data import. For other descriptions, refer to the
--tableoption.--trigger 'trigger_name [, trigger_name...]' | --trigger '*
Specifies to import trigger definitions. It does not support data import. For other descriptions, refer to the
--tableoption.--obj-user 'user_name [, user_name...]' | --obj-user '*'
Specifies to import user definitions. It does not support data import. For other descriptions, refer to the
--tableoption.User requirement: The login user must be an administrator or a user with the
dba_usersprivilege in the Oracle compatible mode of OceanBase Database, or an administrator or a user with themysql.usertable privilege in the MySQL compatible mode of OceanBase Database.By default, the following users are ignored:
ROOT,SYS,ORAAUDITOR,LBACSYS,proxyro, andstandbyro.
--role 'role_name [, role_name...]' | --role '*'
Specifies to import role definitions. It does not support data import. For other descriptions, refer to the
--tableoption.User requirement: At present, it applies only to the Oracle compatible mode of OceanBase Database. The login user must be an administrator or a user with the
dba_usersprivilege.By default, the following roles are ignored:
CONNECT,RESOURCE,DBA,PUBLIC, andSTANDBY_REPLICATION.
--sequence 'sequence_name [, sequence_name...]' | --sequence '*
Specifies to import sequence definitions. It does not support data import. For other descriptions, refer to the
--tableoption. It is currently applicable only to OceanBase Database in Oracle compatible mode.--synonym 'synonym_name [, synonym_name...]' | --synonym '*
Specifies to import synonym definitions. It does not support data import. For other descriptions, refer to the
--tableoption. It is currently applicable only to OceanBase Database in Oracle compatible mode.--type 'type_name [, type_name...]' | --type '*
Specifies to import type definitions. It does not support data import. For other descriptions, refer to the
--tableoption. It is currently applicable only to OceanBase Database in Oracle compatible mode.--type-body 'typebody_name [, typebody_name...]' | --type-body '*
Specifies to import type body definitions. It does not support data import. For other descriptions, refer to the
--tableoption. It is currently applicable only to OceanBase Database in Oracle compatible mode.--package 'package_name [, package_name...]' | --package '*
Specifies to import package definitions. It does not support data import. For other descriptions, refer to the
--tableoption. It is currently applicable only to OceanBase Database in Oracle compatible mode.--package-body 'packagebody_name [, packagebody_name...]' | --package-body '*
Specifies to import package body definitions. It does not support data import. For other descriptions, refer to the
--tableoption. It is currently applicable only to OceanBase Database in Oracle compatible mode.--function 'function_name [, function_name...]' | --function '*
Specifies to import function definitions. It does not support data import. For other descriptions, refer to the
--tableoption.--procedure 'procedure_name [, procedure_name...]' | --procedure '*
Specifies to import stored procedure definitions. It does not support data import. For other descriptions, refer to the
--tableoption.--ddl
Specifies to import DDL files. DDL files can be imported in parallel in OceanBase Database V4.2.1 and later.--character-set 'character_set_string' specifies the character set for creating a database connection.
--ignore-unhex specifies to ignore the decoding of hexadecimal strings. This option applies only to binary data types.
--file-suffix 'suffix_name' specifies the file extension for data files during import.
--file-encoding 'encode_name' specifies the file encoding used to read data files, which is not the same as the database encoding.
--source-type hive
Specifies to import partition information from the Hive storage path. During import, directory names containing
=will be parsed intokey=valueformat, where key represents the column name and value represents the column value. Supported file formats include ORC and Parquet. For an example of how to use this parameter, see Command-line application example.--auto-column-mapping
Specifies to enable automatic column mapping. In this mode, table data from the source file will be imported to the target table column by column. The number of columns in the source file can be different from that in the target table. Supported file formats include CSV, ORC, and Parquet. For an example of how to use this parameter, see Command-line application example.
Note
You cannot specify both the --ctl-path and --auto-column-mapping options.
Storage path
-f 'file_path', --file-path= 'file_path'
Specifies the absolute path for storing data files. If the files are stored in a non-local file system, you can use
--tmp-pathto specify a temporary directory.OBLOADER allows you to import database object definitions and table data from local disks, Alibaba Cloud Object Storage Service (OSS), Amazon Simple Storage Service (S3), Apache Hadoop, Tencent Cloud Object Storage (COS), Huawei Cloud Object Storage Service (OBS), Azure Blob Storage, and Google Cloud Storage (GCS). The syntax is as follows:
[scheme://host]path[?parameters] parameters: key[=value]&...Component Description scheme The storage scheme. Supported storage schemes are Alibaba Cloud OSS, Amazon S3, Apache Hadoop, COS, and OBS.
If the specified scheme is not one that is listed above, an error is returned.host The name of the storage space. - When you import data from OSS, S3, COS, or OBS, the
hostparameter specifies the bucket. For more information, see OSS Bucket. - When you import data from Apache Hadoop, the
hostparameter specifies the Hadoop node, which is in the<ip>:<port>or<cluster_name>format.
path The path for importing data from the storage space. The path must start with a /.parameters The parameters required for the request.
The parameters can be a single key or a key-value pair.The following table describes the parameters.
Parameter Required Description Supported storage type Supported from endpoint Yes - The endpoint of the region where the host is located.
- You can access OSS, S3, COS, and OBS services via the domain name endpoint.
oss://ssmp-xxxx-xxxx/test?endpoint=oss-cn-shenzhen-internal.aliyuncs.com.OSS/S3/COS/OBS - OSS: V4.2.0
- S3: V4.2.5
- COS and OBS: V4.3.0
region Yes The endpoint specifies the physical location of the bucket. OSS, S3, COS, and OBS - OSS and S3: V4.2.0
- COS and OBS: V4.3.0
storage-class Yes The Amazon S3 storage class. S3 V4.2.0 access-key Yes The access account for the storage. OSS, S3, COS, and OBS - OSS and S3: V4.2.0
- COS and OBS: V4.3.0
secret-key Yes The access key for the storage. OSS, S3, COS, and OBS - OSS and S3: V4.2.0
- COS and OBS: V4.3.0
hdfs-site-file Yes The configuration file that contains the configuration information of Apache Hadoop, such as the block size and the number of replicas. Hadoop V4.2.1 core-site-file Yes The configuration file that contains the core configuration information of the Hadoop cluster, such as the URI of the Apache Hadoop file system and the default file system. Hadoop V4.2.1 principal Yes The authentication identifier in Kerberos. Hadoop V4.2.1 keytab-file Yes The absolute path of the Keytab file, which is used to authorize users or services to access the system resources. Hadoop V4.2.1 krb5-conf-file Yes The path of the Kerberos configuration file. Hadoop V4.2.1 Here are some examples:
Import data from a local disk.
-f '/home/admin/foo/bar'Import data from Amazon S3.
-f 's3://bucket/path?region={region}&access-key={accessKey}&secret-key={secretKey}'Import data from Alibaba Cloud OSS.
-f 'oss://mybucket/foo/bar?endpoint=myendpoint&access-key=myak&secret-key=mysk'Import data from Hadoop.
-f 'hdfs://***.*.*.*:9000/chang/parquet?hdfs-site-file=/data/0/zeyang/hdfs-site.xml&core-site-file=/data/0/zeyang/core-site.xmlImport data from Tencent Cloud Object Storage.
-f 'cos://mybucket/foo?region=ap-shanghai&access-key=myak&secret-key=mysk' -f 'cos://mybucket/foo?endpoint=yourendpoint&access-key=myak&secret-key=mysk'Import data from Huawei Cloud Object Storage Service.
-f 'obs://mybucket/foo?region=cn-north-1&access-key=myak&secret-key=mysk' -f 'obs://mybucket/foo?endpoint=yourendpoint&access-key=myak&secret-key=mysk'Import data from Azure Blob Storage.
Azure Blob Storage supports the
wasb://andwasbs://(SSL encrypted access) protocols. You can import data in the following ways.Append the sas token to the URI of the wasb protocol as the value of
--file-pathoption.wasb[s]://{container}@{accountName}.{endpointSuffix}/path/file ? SASHere is an example:
wasbs://test-data@loader.blob.core.chinxxxpi.cn/linxxx.csv?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupitfx&se=2024-08-13T14:49:32Z&st=2024-08-13T06:49:32Z&spr=https&sig=m2AXglpESQPYrVvrhxxxxAppend the connection string to the URI of the wasb protocol as the value of
--file-pathoption. Except foraccount key, other parameters are optional.wasb[s]://{container}@{accountName}.{endpointSuffix}/path/file ? account-key=xxxx wasb[s]://{container}@{accountName}.{endpointSuffix}/path/file ? default-endpoints-protocol=xxx & account-key=xxxx & account-name=xxx & endpoint-suffix=xxxHere is an example:
wasbs://test-data@loader.blob.core.chixxxx.cn/lineitem1.csv?account-key=xxxx
Import data from Google Cloud Storage.
-f 's3://bucket/path?region={region}&access-key={accessKey}&secret-key={secretKey}'
- When you import data from OSS, S3, COS, or OBS, the
--ctl-path 'control_path'
Specifies the absolute path for storing the control file on the local disk. The control file stores the built-in processing functions. The data is preprocessed based on the configured functions before the data is imported. For example, the preprocessing includes case conversion and null value check. For more information about the control file, see the Data processing section. This option requires that you specify the value in single quotation marks. For example:
--ctl-path '/home/controls/'.Notice
You cannot use the
--ctl-pathand--auto-column-mappingoptions at the same time.--log-path 'log_path'
Specifies the output directory for OBLOADER operation logs. If you do not specify this option, OBLOADER operation logs are output to the directory specified by the
-foption. In most cases, you do not need to redirect the logs.
Other options
-H, --help
View the help message of the CLI tool.
-V, --version
View the version number of the current tool.
Advanced options
Feature options
Timestamp formats
--nls-date-format 'date-format-string'
Specifies the date format for a database connection in the Oracle compatible mode of OceanBase Database. Default value: YYYY-MM-DD HH24:MI:SS.
--nls-timestamp-format 'timestamp-format-string'
Specifies the timestamp format for a database connection in the Oracle compatible mode of OceanBase Database. Default value: YYYY-MM-DD HH24:MI:SS:FF9.
--nls-timestamp-tz-format 'timestamp-tz-format-string'
Specifies the timestamp with time zone format for a database connection in the Oracle compatible mode of OceanBase Database. Default value: YYYY-MM-DD HH24:MI:SS:FF9 TZR.
Allowlist and blocklist filtering
Table allowlists and blocklists
--exclude-table 'table_name [, table_name...]'
Specifies the table names to be excluded when you import table definitions or data. Table names can be specified by using wildcard characters.
Example:
--exclude-table 'test1,test*,*test,te*st'In this example, the following tables are excluded when you import table definitions or data:
test1
All tables whose names start with test
All tables whose names end with test
All tables whose names start with te and end with st
--exclude-data-types 'datatype [, datatype...]'
Specifies the data types to be excluded when you import data.
--file-regular-expression
Allows you to specify a regular expression to selectively import files when you import single-database single-table or single-database multi-table data. This option is effective only for single-database data import. Example:
--file-regular-expression ".*"matches all files in the directory by using the".*"regular expression, where*matches zero or more arbitrary characters. Example:--file-regular-expression ".*\\.csv"matches files whose names end with.csvby using the".*\\.csv"regular expression, where"\."matches the actual period character".".
Column allowlists and blocklists
--include-column-names 'column_name [, column_name...]'
Allows you to specify the column names for data import.
--exclude-column-names 'column_name [, column_name...]'
Allows you to specify the column names for data exclusion.
Notice
- The specified column names must be in the same case as the column names in the table schema.
- The number of data columns in the data file must be the same as the number of remaining columns after the allowlist or blocklist is applied, and they must be in the same order. If the data file is in the CSV, ORC, or Parquet format, we recommend that you use it with the
--auto-column-mappingoption. - The control file cannot be used together with the
--exclude-column-namesoption. The functionality of the--exclude-column-namesoption is included in the control file.
Error handling
--max-discards int_num
Specifies the maximum number of duplicates allowed for each table. If the number of duplicates in any table exceeds this limit, the data import for that table is stopped, and the log records the failure of importing that table. This does not affect the data import of other tables. The default value is -1, which indicates to continue importing data regardless of duplicates.
Note
This option takes effect only when a table has a primary key or unique key and contains duplicate data.
--retry
Specifies to resume data import from the interrupted point. We recommend that you use this option only when more than 80% of the data has been imported. In this way, you can avoid re-importing the data and reduce the risk of importing a few pieces of duplicate data. Otherwise, you can clear the table and start data import again. This method is more efficient.
Note
- The load.ckpt file is a savepoint file generated by the tool during its operation. The path of this file is located in the directory specified by
-f. If the load.ckpt file does not exist, this option cannot be used. - This option is not applicable to direct load.
- The load.ckpt file is a savepoint file generated by the tool during its operation. The path of this file is located in the directory specified by
--max-errors int_num
Specifies the maximum number of errors allowed for each table during data import. If the number of errors in importing data of any table exceeds this limit, the data import for that table is stopped, and the log records the failure of importing that table. Valid values of this option are 0, -1, and a positive integer N. If you set this option to -1, errors are ignored and data import continues. The default value is 1000.
--strict= 'strict_string'
Controls the impact of dirty data on the exit status of the process during data import. The default value is true. In this case, if the imported data contains bad records or discarded records, the program exits in failure (
System exit 1). If you set the value to false, the exit status of the program is not affected by bad records or discarded records in the imported data. The program exits in success (System exit 0).Note
You can use this option in combination with the
--max-discardsor--max-errorsoption. In this way, when the number of duplicates or errors is within the specified range, the system will skip the errors and continue the process. For more information, see Error handling.--replace-data
Specifies to replace the duplicate data in the table. This option applies only to tables with defined primary keys or unique keys (including non-null fields). If the file contains a large amount of duplicate data, which exceeds 30% of the total data, we recommend that you clear the table and start data import again. Data replacement is less efficient than data import into an empty table. This option takes effect only when it is used together with the
--csv,--sql,--cut,--par, or--orcoption, but not with the--ddloption.Note
- If the file and the table contain duplicate data, the data in the table will be replaced with that in the file.
- If a table does not have a primary key or unique key, this option will append data to the table.
- Do not specify this option in the command line if you do not want to replace the duplicate data, to avoid impact on your business.
Performance options
--rw float_num
Specifies the ratio of the number of file parsing threads to the total number of threads. Default value: 1.
This option is used in combination with the
--threadoption to calculate the number of file parsing threads as the value of--threadoption multiplied by the value of--rwoption.--slow float_num
Specifies the threshold for the tool to enter the slow load mode. When the memory usage of OceanBase Database reaches 75%, OBLOADER automatically reduces the speed to prevent the rapid increase of database memory usage. Default value: 0.75.
--pause float_num
Specifies the threshold for the tool to enter the pause load mode. When the memory usage of OceanBase Database reaches 85%, OBLOADER automatically stops the import to prevent high database memory water level from causing memory issues. Default value: 0.85.
--batch int_num
Specifies the transaction size for batch writing. We recommend that you set the value of this option to a smaller one in proportion to the width of the table. The value of this option must not cause an overflow of database memory. Default value: 200.
Note
OBLOADER V4.2.0 and later versions support adaptive
--batchdefault values based on the memory of the Java Virtual Machine (JVM).If you use OBLOADER V4.2.8.1 or V4.2.8.2, the actual number of parameters in the prepared statement may exceed the limit, which can cause an ODP core. If this issue occurs, you can manually set this option to a reasonable value.
--thread int_num
Specifies the number of concurrent threads. This option directly corresponds to the number of write threads. When used in combination with the
--rwoption, it is used to calculate the number of file parsing threads, calculated as the value of the--threadoption multiplied by the value of the--rwoption. The default value is2 × CPU. If the CPU is greater than 16, the default maximum value is 32.Note
OceanBase Database V4.2.1 and later versions support parallel import of DDL. When you import database objects defined in versions earlier than OceanBase Database V4.2.1, you do not need to specify this option.
--block-size
Specifies the threshold for splitting file blocks, which supports the LONG data type. For example,
--block-size 64indicates that the size of a single file does not exceed 64 MB. When you specify this option, you do not need to explicitly indicate the unit. The default unit is MB. The default value is 64. By default, OBLOADER automatically splits large files into multiple logical subfiles based on 64 MB without occupying additional storage space.--max-tps int_num
Specifies the maximum TPS. When you specify this option, it can ensure a stable import rate.
--max-wait-timeout int_num
Specifies the maximum waiting time for OceanBase Database to perform major compactions. When you specify this option, you do not need to explicitly indicate the unit. The default unit is hours. When OceanBase Database is in the major compaction mode, OBLOADER is in the pause load mode. The maximum waiting time specified by this option cannot be exceeded. The default value is 3.
--mem
Supported in OBLOADER V4.3.2 and later versions, this option specifies the memory size for the JVM. The supported units are K, M, G, and T. The default value is 4G.
Other options
--truncate-table
Specifies to truncate tables in the target database that contain data files before importing data. This option can be used only with one of the data format options. If it is used with
--allor--table '*', it specifies to truncate all tables that contain data files in the target database. You can explicitly specify--table 'test1,test2,...'to truncate only specific tables.If you use this option, the system will ask you to confirm the operation. You must enter Y to confirm.
Notice
Do not use this option to truncate the target database or tables in it. We recommend that you manually truncate tables based on your business needs to avoid impacting your business.
--delete-from-table
Specifies to delete tables in the target database that contain data files before importing data. This option can be used only with one of the data format options. If it is used with
--allor--table '*', it specifies to delete all tables that contain data files in the target database. You can explicitly specify--table 'test1,test2,...'to delete only specific tables.If you use this option, the system will ask you to confirm the operation. You must enter Y to confirm.
Notice
- If you use the
--allor--table '*'option together with the--delete-from-tableoption, the program will delete all tables that contain data in the target database, regardless of whether the directories specified by the -f option contain data files of the tables. - Do not use this option to delete data in the target database or tables. Especially do not use it to delete data in large tables. We recommend that you manually delete data based on your business needs to avoid impacting your business.
- If you use the
--yes
When you use the
--truncate-tableor--delete-from-tableoption, the system will ask you to confirm the operation to remind you of the risk. If you do not want to confirm interactively, you can use the--yesoption to ignore the confirmation, so that the system will not prompt you to confirm the risk.--replace-object
Specifies to replace existing database object definitions with those defined in the files when you import database objects. Objects such as tables and synonyms are deleted and then created, and objects such as functions and stored procedures are replaced by using the CREATE OR REPLACE statement. This option can be used only with the
--ddlor--mixoption; it does not take effect with other data format options such as--csvor--sql.Notice
- Objects in the target database will be forcibly replaced with those defined in the files.
- Do not use this option if you do not want to replace database objects, to avoid impacting your business.
--session-config
Specifies the connection configuration file. The configuration file directory provides a default configuration file:
<root directory of the tool>/conf/session.config.json. The file takes effect without configuration. We recommend that you specify this option only when you need to load multiple connection configurations into the same configuration file directory.--default-date
The alternative value for invalid dates. No default value is provided. The date value is formatted based on the
--date-formatoption. If the formatting fails, the date alternative value is returned.Notice
This parameter takes effect only on the DATE and DATETIME data types in MySQL compatible mode of OceanBase Database.
--date-format
The datetime format of the original data. The default value is yyyy-MM-dd HH:mm:ss. It is used to verify whether the data in the file is valid date time data. If the datetime data is invalid, the value specified by
--default dateis used as the default datetime.Notice
This parameter takes effect only on the DATE and DATETIME data types in MySQL compatible mode of OceanBase Database.
Options
Option Required Description Introduced in Deprecated in -h(--host) Yes The IP address of the host to connect to ODP or an OceanBase physical node.
If the IP address is an IPv6 address, enclose it in square brackets ([]). Example:[2001:0db8:85a3:xxxx:xxxx:8a2e:0370:7334].-P(--port) Yes The port to connect to ODP or an OceanBase physical node. --rpc-port No The port for connecting to the OBServer node's RPC service. V4.2.5 --compat-mode No Imports the table schema defined in MySQL compatible mode of OceanBase Database.
When importing a table schema defined in MySQL, OBLOADER converts the native MySQL table creation statement to an equivalent statement supported by the MySQL compatible mode of OceanBase Database.Notice
- Only MySQL table schemas can be imported.
- Only MySQL Database V5.6, V5.7, and V8.0 are supported.
V4.2.7 --direct No Specifies the direct load mode. V4.2.5 --parallel No The degree of parallelism of data loading in direct load mode. V4.2.6 -u(--user) Yes The username, tenant name, and cluster name. Format: <user>@<tenant>#<cluster>.-p(--password) No The database password. -D(--database) No The database name. -f(--file-path) Yes The directory where the data file is stored or the absolute path of the data file. --sys-user No The username under the sys tenant. --sys-password No The password of a specific user in the sys tenant. -t No The tenant name for connecting to OceanBase Database. When you perform direct load in the cloud, you must use the --public-cloud -toption.Notice
When performing direct load in the cloud, you must use the
--public-cloud -toption.--public-cloud No The option for importing database objects or table data from an OceanBase cluster deployed in the cloud. Notice
When performing direct load in the cloud, you must use the
--public-cloud -toption.--file-suffix No The file extension name.
Generally, the file extension name is associated with the file format. For example, the extension name of a CSV file is usually .csv. If you do not strictly follow this naming convention, you can name a CSV file with any extension, such as .txt. In this case, OBLOADER cannot identify the target file based on the format. The --file-suffix option is optional, and each data format has a corresponding default value. By default, the file extension name for CSV format is .csv, for SQL format is .sql, for CUT format is .dat, and for POS format is .dat. When specified on the command line, enclose the value in single quotation marks. Example:--file-suffix '.txt'.--file-encoding No The file encoding (different from the database encoding).
When specified on the command line, enclose the value in single quotation marks. Example:--file-encoding 'GBK'. Default value: UTF-8.--ctl-path No The directory where the control file is stored. --log-path No The output directory for log files. --ddl No The option for importing DDL files. Starting from OceanBase Database V4.2.1, OBLOADER supports parallel import of DDL files.
A DDL file is a database object definition file, and its naming convention is object name-schema.sql. When specified on the command line, OBLOADER imports only the database object definitions and does not import any table data.Notice
Do not include comments or switches in the file. If the database objects are dependent on each other, the import may fail and manual intervention will be required.
--csv No The option for importing CSV files. (Recommended)
A CSV file is a data file stored in the standard CSV format, and its naming convention is table name.csv. For more information about the CSV format, see the definition in RFC 4180. The most common error in the CSV format is incorrect delimiters. Single or double quotation marks are commonly used as delimiters. If the data contains delimiters, escape characters must be specified. Otherwise, OBLOADER cannot parse the data. We recommend that you use the CSV format. We recommend that you use the --csv option in combination with the--tableoption. If you use the--alloption in combination with it, OBLOADER will import only the data file corresponding to the table and will not import any database object definitions.--sql No The option for importing SQL files. (Different from DDL files)
An SQL file is a data file stored in the format of an INSERT statement, and its naming convention is table name.sql. The content of the file consists of executable INSERT statements for each row of table data. OBLOADER V4.3.1 and later support the import of SQL text files in the batch insert format. The content format of an SQL file is clearly different from that of a DDL file. We recommend that you use the --sql option in combination with the--tableoption. If you use the--alloption in combination with it, OBLOADER will import only the data file corresponding to the table and will not import any database object definitions.Notice
The data must not contain SQL functions, special characters, or line breaks. Otherwise, the file may not be properly parsed.
--orc No The option for importing ORC files.
An ORC file is a data file stored in columnar format, and its naming convention is table name.orc. Please refer to the Apache ORC format definition.V4.0.0 --par No The option for importing Parquet files.
A Parquet file is a data file stored in columnar format, and its naming convention is table name.parquet. Please refer to the Apache Parquet format definition.Note
When using OBLOADER V4.2.5 or earlier to import Parquet files, the DECIMAL, DATE, TIME, and TIMESTAMP data types are not supported.
V4.0.0 --mix No The option for importing MIX files.
A MIX file is a file that contains both DDL and DML statements. The file does not follow a strict naming convention.Notice
MIX files do not follow a strict format, the processing is complex, and the performance is poor. We recommend that you do not use them.
--pos No The option for importing POS files.
A POS file is a data file stored in fixed-length byte format, and its naming convention is table name.dat. Fixed-length bytes mean that each column of data is stored using a fixed number of bytes. Extra spaces are added if the data is shorter than the fixed length, and the excess data is truncated if it is longer than the fixed length. Truncating data by byte length may result in data corruption. We recommend that you use the --pos option in combination with the--tableoption. If you use the--alloption in combination with it, OBLOADER will import only the data file corresponding to the table and will not import any database object definitions. (This is different from fixed-length character format).--cut No The option for importing CUT files.
A CUT file is a data file stored in a format where columns are separated by single characters or strings, and its naming convention is table name.dat. How do you identify a CUT file from a CSV file? In a CSV file, fields are separated by single characters, and the commonly used separator is a comma. In a CUT file, fields are separated by strings. For example, the fields are separated by the string|@|. In a CSV file, single or double quotation marks are used as delimiters between fields. A CUT file does not have delimiters. We recommend that you use the --cut option in combination with the--tableoption. If you use the--alloption in combination with it, OBLOADER will import only the data file corresponding to the table and will not import any database object definitions.Notice
Data in a CUT file must be stored in whole lines. If the field separator is a single character, avoid using the separator, carriage return, or line break in the data. Otherwise, OBLOADER cannot correctly parse the data.
--avro No The option for importing Avro files.
An Avro file is an Avro file that conforms to the Apache standard and has the extension name.avro. The data types supported for import and the limitations are as follows:- All primary types are supported.
- All logical types are supported.
- Complex types are not supported.
V4.3.0 --all No The option for importing all supported database objects and table data. --table-group No The option for importing table group definitions. V3.1.0 --table No The option for importing table definitions or table data. --view No The option for importing view definitions. --trigger No The option for importing trigger definitions. --obj-user No The option for importing user definitions. --role No The option for importing role definitions. --sequence No The option for importing sequence definitions. --synonym No The option for importing synonym definitions. --type No The option for importing type definitions. V4.0.0 --type-body No The option for importing type body definitions. --package No The option for importing package definitions. --package-body No The option for importing package body definitions. --function No The option for importing function definitions. --procedure No The option for importing stored procedure definitions. --replace-object No The option for replacing existing object definitions. (Not recommended. We recommend that you manually replace them.) --rw No The ratio of parsing threads for data files. --slow No The threshold for triggering a slow import. --pause No The threshold for triggering a stop import. --batch No The number of records in each batch transaction. --thread No The number of concurrent threads in the import task. --block-size No The size of each file split. --retry No The option for importing from the last save point. --max-tps No The import rate limit. Default unit: rows per second. --max-wait-timeout No The maximum timeout for waiting for database compaction. --nls-date-format No The session-level date and time format. (Supported only in the Oracle compatible mode of OceanBase Database) --nls-timestamp-format No The session-level timestamp format. (Supported only in the Oracle compatible mode of OceanBase Database) --nls-timestamp-tz-format No The session-level timestamp with time zone format. (Supported only in the Oracle compatible mode of OceanBase Database) --trail-delimiter No The option for deleting the last column separator at the end of a line. --with-trim No The option for deleting leading and trailing spaces. --skip-header No A boolean value indicating whether to skip the first line of CSV/CUT files. Only from V3.3.0, the first line of CUT files can be skipped. --skip-footer No The option for skipping the last line of CUT files. V3.3.0 --null-string No The string to replace NULL. Default value: \N . --empty-string No The string to replace empty strings (' '). Default value: \E . --line-separator No The line separator.
When you import CSV, CUT, POS, or SQL files, you can specify the line separator in the data file. The default value of this option depends on the system platform and includes only three possible values: \r, \n, and \r\n.--column-separator No The column separator in CSV files, which is different from the column separator in CUT files. --escape-character No The escape character. This option supports only single characters and can be used only with the --csvor--cutoption.- When used with the
--csvoption, the default value is null. - When used with the
--cutoption, the default value is \.
Note
If you specify this option for a special (invisible) character, use its hexadecimal representation. For example, you can use '\x09' to represent an invisible tab character.
--column-delimiter No The string delimiter. This option supports only single characters and can be used only with the --csvoption. Default value: single quotation mark('.').--ignore-unhex No The option for ignoring the decoding of hexadecimal strings. --exclude-table No The option for excluding table definitions and data. --exclude-data-types No The option for skipping the import of data of specified data types. --column-splitter No The column separator string in CUT files, which is different from the column separator in CSV files. --max-discards No The maximum number of duplicate records allowed for a single table during import. Default value: -1. --max-errors No The maximum number of errors allowed for a single table during import. Default value: 1000. --include-column-names No The option for importing data based on specified column names. --exclude-column-names No The option for excluding data corresponding to specified column names during import. --replace-data No The option for replacing duplicate data. (Applicable only to tables with a primary key or unique key in a non-null column) --truncate-table No The option for truncating the target table before the import. (Not recommended. We recommend that you manually truncate the table.) --delete-from-table No The option for deleting all data in the target table before the import. (Not recommended. We recommend that you manually delete the data.) --yes No The option for ignoring the confirmation prompt after you use the --truncate-tableor--delete-from-tableoption.-V(--version) No The option for viewing the OBLOADER version number. --no-sys No The option for indicating that the sys tenant password cannot be provided in an OceanBase Database environment. V3.3.0 --logical-database No The option for importing into a logical database of ODP (Sharding). V3.3.0 --file-regular-expression No The option for specifying the regular expression of the file name for single-database single-table or single-database multi-table imports. V3.3.0 Note
V4.3.2 and later support the import of multi-tables in a single database by using regular expressions.
--ignore-escape No The option for ignoring the escape of characters when importing CUT files. V3.3.0 --character-set No The character set for creating a database connection.
Default value: the value ofjdbc.url.character.encodingin the session.properties file. The value specified by the --character-set option overrides the value ofjdbc.url.character.encoding. This option supports the following character sets: binary, gbk, gb18030, utf16, and utf8mb4.V4.2.4 --strict No The option for controlling the impact of dirty data on the exit status of the process during import. V4.2.4 --session-config No The option for specifying the connection configuration file. V4.2.6 --default-date No The alternative value for invalid dates. The date value is formatted based on the --date-formatoption. If the formatting fails, the date value specified by--default dateis returned.V4.3.4.1 --date-format No The datetime format of the original data. The default value is yyyy-MM-dd HH:mm:ss. This option is used to verify whether the data in the file is valid datetime data. If the data is invalid datetime data, the value specified by --default dateis used as the default datetime.V4.3.4.1 -H(--help) No The option for viewing the help of OBLOADER command-line tool.