OBLOADER allows you to specify the information required for import by using command-line options. For more information about the options and their scenarios and examples, see the Quick reference table of options section in this topic and Quick start.
File matching rules
During data import, OBLOADER searches for files that match the specified command-line options. By default, OBLOADER searches for files to import by ${table_name}.${seq}.${file_type}, where ${seq} is optional. For example, if you specify --csv --table 'table1', OBLOADER recursively searches for files that match the table1.${seq}.csv rule.
You can also use wildcard characters to specify the files to import. For example, if you specify --csv --table '*', OBLOADER searches for all CSV files and splits the file names by ${table_name}.${seq}.${file_type} to extract table names, thus implementing multi-table import.
If the name of a file to import does not contain a table name, you can use the --file-regular-expression option to specify the matching rule. For example, if the file name is part-00123-bd461e82-c911-48ce-bfa0-ec1****.c000.snappy.orc, you can use the --orc --table 'table2' --file-regular-expression '.*.orc' command to match all files with the file name extension .orc and import them to the specified table2 table.
OBLOADER V4.3.2 and later allow you to specify a matching rule to import data from multiple tables in a database. You can specify the database name and table names by using group names, including schema and table. If only the schema group name is specified, the --table option must specify a single table, or the database must contain only one table. Otherwise, OBLOADER returns an error and exits. The following matching rules are supported:
Database name.table name.csv: ^(?<schema>[^.]+)(\.)(?<table>[^.]+)(\.)csv$
Table name.csv: ^(?<table>[^.]+)(\.)csv$
Database name.table name.seq.csv: ^(?<schema>[^.]+)(\.)(?<table>[^.]+)(\.)([^.]+\.)csv$
OBLOADER locates the files to be imported by matching their file names with the specified table name. Assume that the table name is t1 and the file format is --csv. In this case, the default file name extension for the matched files is .csv. You can also use the --file-suffix option to specify the file name extension. OBLOADER will traverse the path specified by -f to locate files bound with the t1 table, such as t1.csv, t1.0.csv, and t1.1.2.csv.
If the files to be imported are named based on other rules or no rules, you can use the --file-regular-expression option to define a regular expression for file matching. For example, --file-regular-expression=".*\.csv" will match any files whose names are suffixed with .csv, and --file-regular-expression=".*" will match all files in the specified path.
Overview
Option styles
OBLOADER supports command-line options in Unix and GNU styles.
Unix style: An option is prefixed with a hyphen and each option is a single character, such as
ps -e. In this style, you can omit the space between an option and its value, such as-p******.GNU style: An option is prefixed with double hyphens and each option is a single character or a string, such as
ps --version. An option and its value must be separated with a space, such as--table 'test'.
Option categories
Command-line options in OBLOADER are classified into basic options and advanced options.
Basic options are general options of OBLOADER, including connection options (such as the database connection method), feature options (such as the file format, database object type, and storage path), and other options.
Advanced options are feature options (such as the timestamp format, allowlist- and blocklist-based table/column filtering settings, and error handling method), performance options, and other options.
Required options
When you use OBLOADER to import data, you must specify at least the connection options, format option, database object type option, and storage path option.
Here is a sample command:
$./obloader -h xx.x.x.x -P 2883 -u test@mysql#cluster_a -p ****** -D USERA --csv --table '*' -f /output
In this example, -h, -P, -u, -p, and -D are connection options, the --csv option specifies the file format, the --table option specifies the database object type, and the -f option specifies the storage path.
Basic options
Connection options
OBLOADER can read data from and write data to an OceanBase database only after it is connected to the database. You can connect to an OceanBase database by specifying the following options.
Import types
| Import type | Relevant command-line option |
|---|---|
| OceanBase Database |
|
| ApsaraDB for OceanBase | --public-cloud Imports database object definitions or table data from an ApsaraDB for OceanBase cluster. If you specify this option on the CLI, you do not need to specify the tenant name and cluster name options ( -u <user>@<tenant>#<cluster>). OBLOADER enables the --no-sys option by default. For more information about the --no-sys option, see the corresponding option description. The use of the --public-cloud or --no-sys option affects the import features, performance, and stability. OceanBase Database V2.2.30 and later support throttling on the server. Therefore, to ensure the stability of data import, before you use the --public-cloud or --no-sys option, you can run the following commands to set throttling thresholds as required on the server: alter system set freeze_trigger_percentage=50;alter system set minor_merge_concurrence=64;alter system set writing_throttling_trigger_percentage=80 tenant='xxx'; |
| ODP (sharding) |
|
Direct load
OBLOADER V4.2.6 and later support direct load. This feature is supported in the following OceanBase Database versions: V4.2.1.7 and later of V4.2.1.x series, V4.2.4.x series, as well as V4.3.0.1 and later of V4.3.0.x series.
--rpc-port= rpc_port_num
The remote procedure call (RPC) port for connecting to an OBServer node. This option is used with the
--directand--paralleloptions to specify to connect to the RPC port of an OBServer node to import data in direct load mode.Note
- This option applies only to OceanBase Database V4.2.0 RC2 and later.
- You can query the RPC port of an OBServer node in the following ways:
- In the case of connection through ODP:
- In ApsaraDB for OceanBase, the RPC port of ODP is 3307 by default.
- In OceanBase Database, the RPC port of ODP is 2885 by default. You can change the port by using the
-soption when you start ODP.
- In the case of direct connection to an OBServer node, you can query the
DBA_OB_SERVERSsystem view in the sys tenant for the RPC port of the OBServer node, which is 2882 by default.
- In the case of connection through ODP:
--direct
Specifies to use the direct load mode. This option is used with the
--rpc-portand--paralleloptions.The
--directoption specifies to use a specific direct load mode depending on whether the target table contains data.When the target table is empty, this option specifies to use the full direct load mode and can be used with the
--replace-dataoption. The--replace-dataoption specifies to replace duplicate data when data conflicts occur.When the target table is not empty, this option specifies to use the incremental direct load mode. If the target table contains no index or LOB column, this option can be used with the
--replace-dataoption. Otherwise, it cannot be used with the--replace-dataoption.
Note
- The direct load mode of OBLOADER does not support binary data types.
- When you use OBLOADER for direct load, you can connect to an OBServer node directly or through ODP. The version requirements are as follows:
- In the case of direct connection to an OBServer node, the version of the OBServer node must be V4.2.0 or later.
- In the case of connection through ODP, the version of ODP must be V4.3.0 or later, and that of the OBServer node must be V4.2.1 or later.
--parallel= parallel_num
The degree of parallelism (DOP) for loading data in direct load mode. This option is used with the
--rpc-portand--directoptions.
Feature options
File formats
| Format (option) | Relevant option | Scenario |
|---|---|---|
| --csv Imports data files in the CSV format. |
|
This option applies to almost all business scenarios. |
| --cut Imports data files in the CUT format. |
|
This option applies to almost all business scenarios. You can use this option to import data in flexible data formats that is exported from outdated platforms. |
| --pos Imports data files in the POS format. |
This option must be used in combination with control files. | This option is used for interaction with data extraction, transformation, and loading (ETL) platforms that support only a fixed-length format. |
| --sql Imports data files in the SQL format. |
|
This option applies to scenarios where only data but no schema is imported. Only one SQL syntax with high performance is supported. |
| --mix Imports MIX files. |
|
This option can be used to import SQL files in any format. It has higher compatibility but poorer performance than the --sql option. At present, the whole text is read to the memory for parsing. Therefore, this option is inapplicable if you want to process files with a large data amount. |
| --par Imports data files in the Parquet format. |
|
This option can be used to migrate data from Hive, Spark, or another data source that supports exporting data in the standard Parquet format. |
| --orc Imports data files in the ORC format. |
|
This option can be used to migrate data from Hive, Spark, or another data source that supports exporting data in the standard ORC format. |
| --avro Imports data files in the Avro format. |
--file-suffix 'suffix_name' The file name extension of data files to be imported. |
Generally, this option applies to the big data field for scenarios such as offline migration of analytical processing (AP) business data. |
Note
For more information about the command-line options, see the Quick reference table of options section in this topic.
Import files compressed and exported by OBDUMPER
OBLOADER V4.3.0 and later allow you to import CSV, CUT, POS, and SQL files that are compressed and exported by OBDUMPER. You can specify related configurations by using the following command-line options:
--compress
Specifies whether the file is compressed and exported by OBDUMPER. The value is a Boolean value.
--compression-algo
The compression algorithm used by OBDUMPER. The value is a string. Valid values are
zstd,zlib,gzip, andsnappy. The default value iszstd.--compression-level
The compression level of the compression algorithm used by OBDUMPER. The value is an integer. Compression levels supported for different compression algorithms are as follows:
zstd: 1 to 22. The default value is
3.zlib: -1 to 9. The default value is
-1.gzip and snappy: You cannot specify a compression level for this compression algorithm.
Database object types
--all
Imports all supported database object definitions and table data. When this option is used in combination with
--ddl, all database object definitions are imported. When this option is used in combination with--csv,--sql,--cut, or--pos, all data files in the specified format are imported. To import all database object definitions and table data, you can specify the--alland--ddloptions with a data format option.Note
The
--alloption is mutually exclusive with any database object options. It cannot be specified together with other database object options. If both the--alloption and a database object option are specified, the--alloption will be executed first.--table-group 'table_group_name [,table_group_name...]' | --table-group '*'
Imports table group definitions. This option is similar to the
--tableoption, except that this option does not support data import.--table 'table_name [,table_name...]' | --table '*'
Imports table definitions or table data. When this option is used with the
--ddloption, only table definitions are imported. When this option is used with any data format option, only table data is imported. To specify multiple tables, separate the table names with commas (,). By default, in an Oracle tenant of OceanBase Database, the table names are in uppercase, and in a MySQL tenant of OceanBase Database, the table names are in lowercase. For example, in an Oracle tenant of OceanBase Database, both--table 'test'and--table 'TEST'indicate the table namedTEST, and in a MySQL tenant of OceanBase Database, both--table 'test'and--table 'TEST'indicate the table namedtest. To enable case sensitivity, enclose a table name in brackets ([]). For example,--table '[test]'indicates the table namedtest, while--table '[TEST]'indicates the table namedTEST. If the table name is specified as an asterisk (*), all table definitions or table data are imported.Notice
When you use a control file to import data, the table name specified in the
--tableoption must be in the same letter case as that in the database. Otherwise, the control file fails to take effect.--view 'view_name [, view_name...]' | --view '*'
Imports view definitions. This option is similar to the
--tableoption, except that this option does not support data import.--trigger 'trigger_name [, trigger_name...]' | --trigger '*'
Imports trigger definitions. This option is similar to the
--tableoption, except that this option does not support data import.--sequence 'sequence_name [, sequence_name...]' | --sequence '*'
Imports sequence definitions. This option is similar to the
--tableoption, except that this option does not support data import. This option is supported only in the Oracle compatible mode of OceanBase Database.--synonym 'synonym_name [, synonym_name...]' | --synonym '*'
Imports synonym definitions. This option is similar to the
--tableoption, except that this option does not support data import. This option is supported only in the Oracle compatible mode of OceanBase Database.--type 'type_name [, type_name...]' | --type '*'
Imports type definitions. This option is similar to the
--tableoption, except that this option does not support data import. This option is supported only in the Oracle compatible mode of OceanBase Database.--type-body 'typebody_name [, typebody_name...]' | --type-body '*'
Imports type body definitions. This option is similar to the
--tableoption, except that this option does not support data import. This option is supported only in the Oracle compatible mode of OceanBase Database.--package 'package_name [, package_name...]' | --package '*'
Imports package definitions. This option is similar to the
--tableoption, except that this option does not support data import. This option is supported only in the Oracle compatible mode of OceanBase Database.--package-body 'packagebody_name [, packagebody_name...]' | --package-body '*'
Imports package body definitions. This option is similar to the
--tableoption, except that this option does not support data import. This option is supported only in the Oracle compatible mode of OceanBase Database.--function 'function_name [, function_name...]' | --function '*'
Imports function definitions. This option is similar to the
--tableoption, except that this option does not support data import.--procedure 'procedure_name [, procedure_name...]' | --procedure '*'
Imports stored procedure definitions. This option is similar to the
--tableoption, except that this option does not support data import.--ddl
Imports DDL files. OBLOADER supports parallel import of DDL files in OceanBase Database V4.2.1 and later.--character-set 'character_set_string'
The character set for creating a database connection.--ignore-unhex
Ignores hexadecimal strings in decoding. This option applies only to binary data types.--file-suffix 'suffix_name'
The file name extension of data files to be imported.--file-encoding 'encode_name'
The file character set used when OBLOADER reads data files, which is not the database character set.
--source-type hive
Specifies to import partition information from the Hive storage path. During file import, multilevel subdirectories with table names as their directory names are parsed into column values. ORC and Parquet files are supported.
--auto-column-mapping
Specifies to enable automatic mapping for column names. In this case, data is imported from the source file to the target table based on column names. The number of columns in the source file can be different from that in the target table. CSV, Parquet, and ORC files are supported.
Notice
Do not specify the
--ctl-pathand--auto-column-mappingoptions at the same time.
Storage path
-f 'file_path', --file-path= 'file_path'
The absolute path for storing data files. When data is stored in another file system instead of the local file system, you can use the
--tmp-pathoption to specify a temporary file directory.OBLOADER allows you to import database object definitions and table data from local disks, Alibaba Cloud Object Storage Service (OSS), Amazon Simple Storage Service (S3), Apache Hadoop, Tencent Cloud Object Storage (COS), Huawei Cloud Object Storage Service (OBS), Azure Blob Storage, and Google Cloud Storage (GCS). The syntax is as follows:
[scheme://host]path[?parameters] parameters: key[=value],...Parameter Description scheme The storage scheme. Supported storage schemes are OSS, S3, Apache Hadoop, COS, and OBS.
If the specified scheme is not one that is listed above, an error is returned.host The name of the storage space. - When you import data from OSS, S3, COS, or OBS, the
hostparameter specifies the bucket. For more information, see Bucket. - When you import data from Apache Hadoop, the
hostparameter specifies the Hadoop node, which is in the<ip>:<port>or<cluster_name>format.
path The data storage path in the storage space. The path must start with a slash ( /).parameters The parameters required for the request.
The value can be a single key or a key-value pair.The following table describes the supported parameters.
Parameter Value required Description Supported storage scheme Supported version endpoint Yes - The endpoint of the region where the OSS host resides.
- The endpoint for accessing S3, COS, or OBS.
OSS/S3/COS/OBS - OSS: V4.2.0
- S3: V4.2.5
- COS/OBS: V4.3.0
region Yes The physical location of the bucket. S3/COS/OBS - S3: V4.2.0
- COS/OBS: V4.3.0
storage-class Yes The storage class of Amazon S3. S3 V4.2.0 access-key Yes The AccessKey ID used to access the bucket. OSS/S3/COS/OBS - OSS/S3: V4.2.0
- COS/OBS: V4.3.0
secret-key Yes The AccessKey secret used to access the bucket. OSS/S3/COS/OBS - OSS/S3: V4.2.0
- COS/OBS: V4.3.0
hdfs-site-file Yes A hdfsSiteFile configuration file. The configuration file contains the configuration information of Apache Hadoop, such as the block size and number of replicas. Storage and access rules are set for Apache Hadoop based on the configuration information. Apache Hadoop V4.2.1 core-site-file Yes A hdfsSiteFile configuration file. The configuration file contains the core configuration information of the Hadoop cluster, such as the URI of the file system and the default file system of Apache Hadoop. Apache Hadoop V4.2.1 principal Yes The identifier for identity authentication in Kerberos. Apache Hadoop V4.2.1 keytab-file Yes The absolute path of the Keytab file, which authorizes users or services to access system resources. Apache Hadoop V4.2.1 krb5-conf-file Yes The path where the Kerberos configuration file resides. Apache Hadoop V4.2.1 Here are some examples:
Import data from the local disk
-f '/home/admin/foo/bar'Import data from S3
-f 's3://bucket/path?region={region}&access-key={accessKey}&secret-key={secretKey}'Import data from OSS
-f 'oss://mybucket/foo/bar?endpoint=myendpoint&access-key=myak&secret-key=mysk'Import data from Apache Hadoop
-f 'hdfs://***.*.*.*:9000/chang/parquet?hdfs-site-file=/data/0/zeyang/hdfs-site.xml&core-site-file=/data/0/zeyang/core-site.xmlImport data from COS
-f 'cos://mybucket/foo?region=ap-shanghai&access-key=myak&secret-key=mysk' -f 'cos://mybucket/foo?endpoint=yourendpoint&access-key=myak&secret-key=mysk'Import data from OBS
-f 'obs://mybucket/foo?region=cn-north-1&access-key=myak&secret-key=mysk' -f 'obs://mybucket/foo?endpoint=yourendpoint&access-key=myak&secret-key=mysk'Import data from Azure Blob Storage
Azure Blob Storage provides the
wasb://andwasbs://(encrypted access over SSL) protocols. You can import data from Azure Blob Storage in either of the following ways.Add the SAS token to the end of the URI of the wasb protocol as the value of the
--file-pathoption.wasb[s]://{container}@{accountName}.{endpointSuffix}/path/file ? SASHere are some examples:
wasbs://test-data@loader.blob.core.chinxxxpi.cn/linxxx.csv?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupitfx&se=2024-08-13T14:49:32Z&st=2024-08-13T06:49:32Z&spr=https&sig=m2AXglpESQPYrVvrhxxxxAdd the connection string to the end of the URI of the wasb protocol as the value of the
--file-pathoption.account keyis required, and other parameters are optional.wasb[s]://{container}@{accountName}.{endpointSuffix}/path/file ? account-key=xxxx wasb[s]://{container}@{accountName}.{endpointSuffix}/path/file ? default-endpoints-protocol=xxx & account-key=xxxx & account-name=xxx & endpoint-suffix=xxxHere are some examples:
wasbs://test-data@loader.blob.core.chixxxx.cn/lineitem1.csv?account-key=xxxx
Import data from GCS
-f 's3://bucket/path?region={region}&access-key={accessKey}&secret-key={secretKey}'
- When you import data from OSS, S3, COS, or OBS, the
--ctl-path 'control_path'
The absolute path on a local disk for storing control files. You can configure built-in preprocessing functions in a control file. Data will be preprocessed by these functions before being imported. For example, the functions can perform case conversion and check the data for empty values. For information about the use of control files, see the Data processing topic. When you specify this option on the CLI, enclose the value in single quotation marks (' '), for example,
--ctl-path '/home/controls/'.Notice
Do not specify the
--ctl-pathand--auto-column-mappingoptions at the same time.--log-path 'log_path'
The output directory for the operational logs of OBLOADER. If this option is not specified, OBLOADER operational logs are stored in the directory specified by the
-foption. Redirection is not required for log output, unless otherwise specified.
Other options
-H/--help
Shows the help information about the tool.
-V/--version
Shows the version of the tool.
Advanced options
Feature options
Timestamp formats
--nls-date-format 'date-format-string'
The format of dates in database connections in the Oracle compatible mode of OceanBase Database. The default value is
YYYY-MM-DD HH24:MI:SS.--nls-timestamp-format 'timestamp-format-string'
The format of timestamps in database connections in the Oracle compatible mode of OceanBase Database. The default value is
YYYY-MM-DD HH24:MI:SS:FF9.--nls-timestamp-tz-format 'timestamp-tz-format-string'
The format of timestamps that contains a time zone in database connections in the Oracle compatible mode of OceanBase Database. The default value is
YYYY-MM-DD HH24:MI:SS:FF9 TZR.
Allowlist- and blocklist-based filtering
Allowlist- and blocklist-based table filtering
--exclude-table 'table_name [, table_name...]'
Excludes the specified tables from the import of table definitions or table data. Fuzzy match on table names is supported,
for example,
--exclude-table 'test1,test*,*test,te*st'.The preceding example specifies to exclude the following tables from the import of table definitions or table data:
test1
All tables with a table name starting with
testAll tables with a table name ending with
testAll tables with a table name starting with
teand ending withst
--exclude-data-types 'datatype [, datatype...]'
Excludes the specified data types from the import of table data.
--file-regular-expression
The regular expression for matching file names when you import data from one or more tables in a database. This option applies only to import from a single database. For example,
--file-regular-expression ".*"specifies to use the".*"regular expression to match all files in the directory. In the expression,*matches zero or more characters.--file-regular-expression ".*\\.csv"specifies to use the".*\.csv"regular expression to match files whose file names end with".csv". In the expression,"\."matches the period".".
Allowlist- and blocklist-based column filtering
--include-column-names 'column_name [, column_name...]'
Includes the specified columns in the import of table data.
--exclude-column-names 'column_name [, column_name...]'
Excludes the specified columns during data import.
Notice
- The letter case of the specified column name must be the same as that of the column name in the table structure.
- In the imported data file, the excluded columns must have no corresponding data, and the imported columns must be in the same order as the columns in the table.
Error handling
--max-discards int_num
The maximum amount of duplicate data allowed in a single table to import. If duplicate data in a table exceeds the specified maximum, data import for this table stops. The import failure on the table is logged. Data import for other tables is not affected. The default value is
-1, indicating that data import does not stop for duplicate data.Note
This option takes effect only when the table contains a primary key or a unique key and duplicate data exists in the table.
--retry
Resumes the import task from the breakpoint. We recommend that you specify this option to resume an import task that has more than 80% of data imported, so that you do not have to start a new import. Duplicate data errors may occur during the resumed import. If only a small part of the data has been imported, you can clear tables and start a new import, which is more efficient.
Notice
- The
load.ckptfile is a savepoint file generated when the tool runs and is located in the directory specified by the-foption. You cannot use this option if theload.ckptfile does not exist. - This option does not apply to direct load.
- The
--max-errors int_num
The maximum number of errors allowed for an import. If the number of import errors of a table exceeds the specified maximum, data import for this table stops. The import failure on the table is logged. This option can be set to
0,-1, or any positive integer N. If you set this option to-1, the errors are ignored and the import continues. The default value is1000.--strict= 'strict_string'
Specifies to control the impact of dirty data on the process end status during import. The default value is
true, which indicates that the program is to end in the failed state (System exit 1) when the imported data contains a Bad Record or Discard Record error. The valuefalseindicates that the final end state (System exit 0) of the program is not affected when the imported data contains aBad RecordorDiscard Recorderror.Note
This option can be used in combination with the
--max-discardsor--max-errorsoption, which specifies to skip duplicate data or errors and proceed with the process when the amount of duplicate data or the number of errors is within the specified range. For more information, see Error handling.--replace-data
Replaces duplicate data in the table. This option is only applicable to tables that have primary keys or unique keys with the
NOT NULLconstraint. If a file has a large amount of duplicate data, such as more than 30% of the total data volume, we recommend that you clear the table and import it again. The performance of data replacement is lower than that of import after table clearing. This option can be used only with the--csv,--sql, and--cutoptions, and does not take effect for the--ddloption.Notice
- When the file and the table have duplicate data, the data in the table is replaced with the data in the file.
- For tables without primary keys or unique keys, this option appends data to the tables.
- If you do not need to replace duplicate data, do not specify this option. Otherwise, business may be affected.
Performance options
--rw float_num
The proportion of data file parsing threads. The default value is
1.You can use this option with the
--threadoption to calculate the number of file parsing threads by using the following formula: Number of file parsing threads = Value of--thread× Value of--rw.--slow float_num
The threshold for triggering a slow import. When the memory usage of OceanBase Database reaches 75%, OBLOADER slows down to prevent an excessively high memory usage on the database. The default value is
0.75.--pause float_num
The threshold for triggering an import pause. When the memory usage of OceanBase Database reaches 85%, OBLOADER pauses data import to prevent issues caused by an excessively high memory usage on the database. The default value is
0.85.--batch int_num
The number of records for a batch of transactions. We recommend that you set this option to a value inversely proportional to the table width. Do not set this option to an excessively high value, which may cause database memory overflow. The default value is
200.Note
In OBLOADER V4.2.0 and later, the default value of the
--batchoption can be adapted based on the Java virtual machine (JVM) memory.In OBLOADER V4.2.8.1 and V4.2.8.2, the number of parameters in an executed prepared statement may exceed the upper limit, causing a core dump in ODP. In this case, you can manually set this option to an appropriate value.
--thread int_num
The number of concurrent threads allowed. This option corresponds to the number of import threads. You can use this option with the
--rwoption to calculate the number of file parsing threads based on the following formula: Number of file parsing threads = Value of--thread× Value of--rw. The default value is the number of CPU cores multiplied by 2. If the number of CPU cores is greater than 16, the maximum value is 32.Note
OBLOADER supports parallel import of DDL files in OceanBase Database V4.2.1 and later. You do not need to specify this option when you import database object definitions of versions earlier than V4.2.1.
--block-size
The block size for a file to be exported. This option is supported for the LONG data type. For example,
--block-size 64indicates that the size of a file cannot exceed 64 MB. When you specify this option, you do not need to explicitly specify the unit. The default unit is MB. The default value is64. By default, OBLOADER automatically splits a large file into multiple logical subfiles (or blocks) sized 64 MB. The logical subfiles do not occupy additional storage space.--max-tps int_num
The maximum import speed. You can specify this option to ensure import stability.
--max-wait-timeout int_num
The timeout period of waiting for a database major compaction to complete. When specifying this option, you do not need to explicitly specify the unit. The default unit is hour. When the database is under a major compaction, OBLOADER stops data import and waits up to the specified period. The default value is
3.
Other options
--truncate-table
Truncates tables in the target database. This option can be used only with a data format option. When this option is used with the
--allor--table '*'option, it specifies to truncate all tables in the target database. If you want to truncate only some of the tables, you can explicitly specify them in the format of--table 'test1,test2,[....]'. When this option is used with the--with-data-filesoption, it specifies to truncate tables that have the corresponding data files.Notice
- If the
--allor--table '*'option is used with the--truncate-tableoption, the tool truncates all tables in the target database, even if no corresponding data files of the tables exist in the directory specified by the-foption. - We strongly recommend that you manually truncate tables, rather than using this option, to avoid impacts on business.
- If the
--with-data-files
When this option is used with the
--truncate-tableor--delete-from-tableoption, it specifies to truncate or clear tables that have corresponding data files before data import. If this option is used alone, it does not take effect.--delete-from-table
Clears tables in the target database before the import. This option can be used only with a data format option. When this option is used with the
--allor--table '*'option, it specifies to clear all tables in the target database. If you want to clear only some of the tables, you can explicitly specify them in the format of--table 'test1,test2,[....]'. When this option is used with the--with-data-filesoption, it specifies to clear tables that have corresponding data files only.Notice
- If the
--allor--table '*'option is used with the--delete-from-tableoption, the tool clears all tables in the target database, even if no corresponding data files of the tables exist in the directory specified by the-foption. - Do not use this option to clear tables in the target database, especially tables with large data volumes. We strongly recommend that you clear tables manually based on your business requirements to avoid impacts on your business.
- If the
--replace-object
Replaces existing database object definitions during the import. For tables and synonyms, existing definitions are deleted and then new ones are created. For functions and stored procedures, existing definitions are replaced by using the
CREATE OR REPLACEstatement. This option can be used only with the--ddlor--mixoption, and does not take effect for data format options such as--csvand--sql.Notice
- If an object already exists in the target database, the object definition is forcibly replaced with the object definition stored in the import file.
- If you do not need to replace objects in the target database, do not use this option. Otherwise, business may be affected.
--session-config
Specifies the connection configuration file. The program package provides a default configuration file named
session.config.jsonin the<root directory of the tool>/confdirectory. It takes effect without modification. We recommend that you specify this option if you want to load multiple connection configurations by using one program package.
Quick reference table of options
| Option | Required | Description | Introduced in | Deprecated |
|---|---|---|---|---|
| -h(--host) | Yes | The host IP address for connecting to ODP or a physical OceanBase Database node. | ||
| -P(--port) | Yes | The host port for connecting to ODP or a physical OceanBase Database node. | ||
| --rpc-port | No | The RPC port for connecting to the OBServer node. | V4.2.5 | |
| --compat-mode | No | Imports MySQL table schema definitions in a compatible mode. If you have specified this option when you import MySQL table schema definitions, OBLOADER converts the original CREATE TABLE statement to a synonymous statement that is supported in the MySQL compatible mode of OceanBase Database.
Notice
|
V4.2.7 | |
| --direct | No | Specifies to use the direct load mode. | V4.2.5 | |
| --parallel | No | The DOP for loading data in direct load mode. | V4.2.6 | |
| -u(--user) | Yes | The username, tenant name, and cluster name, in the format of <user>@<tenant>#<cluster>. |
||
| -p(--password) | No | The password used to log in to the database. | ||
| -D(--database) | No | The name of the database. | ||
| -f(--file-path) | Yes | The directory that stores the data file or the absolute path of the data file. | ||
| --sys-user | No | The name of the user in the sys tenant. | ||
| --sys-password | No | The password of the user in the sys tenant. | ||
| -t | No | The tenant name used to log in to OceanBase Database. This option is used with the --public-cloud option.
NoticeYou must use this option with the |
||
| --public-cloud | No | Imports database object definitions or table data from an ApsaraDB for OceanBase cluster.
NoticeYou must use this option with the |
||
| --file-suffix | No | The file name extension. Generally, the file name extension is correlated with the file format. For example, a CSV file is usually named as xxx.csv. If you do not strictly follow the naming conventions, you may name a CSV file with any extension, such as .txt. In this case, OBLOADER cannot identify the file as a CSV file. This option is optional. A default value is available for each data format. The default file name extension is .csv for a CSV file, .sql for an SQL file, and .dat for a CUT or POS file. When you specify this option on the CLI, enclose the value in single quotation marks (''), such as --file-suffix '.txt'. |
||
| --file-encoding | No | The file character set, which is different from the database character set. When you specify this option on the CLI, enclose the value in single quotation marks (' '), such as --file-encoding 'GBK'. Default value: UTF-8. |
||
| --ctl-path | No | The directory of the control files. | ||
| --log-path | No | The directory where log files are stored. | ||
| --ddl | No | Imports DDL files. OBLOADER supports parallel import of DDL files in OceanBase Database V4.2.1 and later. A DDL file stores the database object definitions, and is named in the format of object name-schema.sql. When this option is specified, only database object definitions are imported, and table data is not imported. NoticeAvoid comments or statements to enable/disable a feature in the file. If database objects depend on each other, the import may fail and manual intervention is required. |
||
| --csv | No | Imports data files in the CSV format. This option is recommended. A CSV file stores data in the standard CSV format, and is named in the format of table name.csv. For information about CSV format specifications, see the definitions in RFC 4180. Delimiter errors are the most common errors that occur in CSV files. Single or double quotation marks are usually used as the delimiter. If data in the file contains the delimiter, you must specify escape characters. Otherwise, OBLOADER fails to parse the data due to its incorrect format. We recommend that you use the CSV format. We recommend that you use this option with the --table option. When this option is used with the --all option, OBLOADER imports only table data but not database object definitions. |
||
| --sql | No | Imports data files in the SQL format, which is different from DDL files. An SQL file stores data in the format of INSERT statements, and is named in the format of table name.sql. Each line of table data corresponds to an executable INSERT statement in an SQL file. OBLOADER V4.3.1 and later allow you to import SQL text files that store data in the format of batch insert statements. An SQL file is different from a DDL file in terms of content format. We recommend that you use this option with the --table option. When this option is used with the --all option, OBLOADER imports only table data but not database object definitions.
NoticeThe data cannot contain SQL functions, special characters, or line breaks. Otherwise, the file may not be correctly parsed. |
||
| --orc | No | Imports data files in the ORC format. An ORC file stores data in the column-oriented format, and is named in the format of table name.orc. For more information about ORC format definitions, see ORC Specification. |
V4.0.0 | |
| --par | No | Imports data files in the Parquet format. A Parquet file stores data in the column-oriented format. The file is named in the format of table name.parquet. For more information about Parquet format definitions, see File Format on the official Apache Parquet website. NoteWhen you use OBLOADER V4.2.5 or earlier to import Parquet files, the DECIMAL, DATE, TIME, and TIMESTAMP data types are not supported. |
V4.0.0 | |
| --mix | No | Imports a mixed file that contains both definitions and data. A MIX file stores a mix of DDL and DML statements, and does not have strict naming conventions. NoticeMIX files do not have a strict format and feature a complex processing process and poor performance. Therefore, we recommend that you do not use MIX files. |
||
| --pos | No | Imports data files in the POS format. A POS file stores data with a fixed length in bytes, and is named in the format of table name.dat. Data stored in each column of a POS file occupies a fixed number of bytes. A column value shorter than specified is padded with spaces. A column value longer than specified is truncated, in which case data may be garbled. We recommend that you use this option with the --table option. When this option is used with the --all option, OBLOADER imports only table data but not database object definitions. This format is different from the format that stores data with a fixed length in characters. |
||
| --cut | No | Imports data files in the CUT format. A CUT file uses a character or a character string as the separator string, and is named in the format of table name.dat. How to distinguish the CUT format from the CSV format? A file in the CSV format uses a single character, which is usually a comma (,), to separate fields. A file in the CUT format usually uses a string, such as |@|, to separate fields. A file in the CSV format uses single quotation marks or double quotation marks as delimiters between fields, while a file in the CUT format does not have delimiters. We recommend that you use this option with the --table option. When this option is used with the --all option, OBLOADER imports only table data but not database object definitions.
NoticeIn a CUT file, each data record is stored in an entire line. When a single character is used as the field separator, avoid special characters in the data, such as delimiters, carriage returns, and line breaks. Otherwise, OBLOADER cannot correctly parse the data. |
||
| --avro | No | Imports data files in the Avro format. A file in the Avro format contains content that complies with the Apache Avro specification. The file name extension is .avro. Supported data types and limitations are as follows:
|
V4.3.0 | |
| --all | No | Imports all supported database object definitions and table data. | ||
| --table-group | No | Imports table group definitions. | V3.1.0 | |
| --table | No | Imports table definitions or table data. | ||
| --view | No | Imports view definitions. | ||
| --trigger | No | Imports trigger definitions. | ||
| --sequence | No | Imports sequence definitions. | ||
| --synonym | No | Imports synonym definitions. | ||
| --type | No | Imports type definitions. | V4.0.0 | |
| --type-body | No | Imports type body definitions. | ||
| --package | No | Imports package definitions. | ||
| --package-body | No | Imports package body definitions. | ||
| --function | No | Imports function definitions. | ||
| --procedure | No | Imports stored procedure definitions. | ||
| --replace-object | No | Replaces existing object definitions. We recommend that you replace object definitions manually, rather than using this option. | ||
| --rw | No | The proportion of data file parsing threads. | ||
| --slow | No | The threshold for triggering a slow import. | ||
| --pause | No | The threshold for triggering an import pause. | ||
| --batch | No | The number of records for a batch of transactions. | ||
| --thread | No | The number of concurrent import threads allowed. | ||
| --block-size | No | The size of a file. | ||
| --retry | No | Re-imports data from the last savepoint. | ||
| --max-tps | No | The maximum import speed. Default unit: lines/second. | ||
| --max-wait-timeout | No | The timeout period of waiting for a database major compaction to complete. | ||
| --nls-date-format | No | The session-level datetime format. This option is supported only in the Oracle compatible mode of OceanBase Database. | ||
| --nls-timestamp-format | No | The session-level timestamp format. This option is supported only in the Oracle compatible mode of OceanBase Database. | ||
| --nls-timestamp-tz-format | No | The session-level timestamp format with a time zone. This option is supported only in the Oracle compatible mode of OceanBase Database. | ||
| --trail-delimiter | No | Truncates the last column separator in a line. | ||
| --with-trim | No | Deletes the space characters on the left and right sides of the data. | ||
| --skip-header | No | Skips the first line of data in CSV/CUT files when the files are imported. Only OBLOADER V3.3.0 and later support skipping the first line of data in a CUT file. | ||
| --skip-footer | No | Skips the last line of data in a CUT file during import. | V3.3.0 | |
| --null-string | No | Replaces the specified character with NULL. Default value: \N. | ||
| --empty-string | No | Replaces the specified character with an empty string (' '). The default value is \E. | ||
| --line-separator | No | The line separator. Custom line separators are supported for the import of CUT files. Default value: \n. Custom line separators are supported for the import of CUT files. The default value of this option is relevant with the system platform. Valid values are \r, \n, and \r\n. |
||
| --column-separator | No | The column separator, which is different from the column separator string in the CUT format. | ||
| --escape-character | No | The escape character. This option can be used only with the --csv or --cut option and supports a single character only.
NoteTo set this option to special (invisible) characters, you can represent the characters in their hexadecimal format. For example, you can use '\x09' to specify the invisible tab character. |
||
| --column-delimiter | No | The string delimiter. This option can be used only with the --csv option and supports a single character only. Default value: single quotation mark ('). |
||
| --ignore-unhex | No | Ignores hexadecimal strings in decoding. | ||
| --exclude-table | No | Excludes the specified tables from the import of table definitions and table data. | ||
| --exclude-data-types | No | Excludes the specified data types from the import of data. | ||
| --column-splitter | No | The column separator string, which is different from the column separator in the CSV format. | ||
| --max-discards | No | The maximum amount of duplicate data allowed in a single table to import. Default value: -1. |
||
| --max-errors | No | The maximum number of errors allowed for importing a table. The default value is 1000. |
||
| --include-column-names | No | Includes the specified columns in the import of table data. | ||
| --exclude-column-names | No | Excludes the specified columns from the import of data. | ||
| --replace-data | No | Replaces duplicate data in the table. This option applies only to tables that have primary keys or unique keys with the NOT NULL constraint. |
||
| --truncate-table | No | Truncates tables in the target database before the import. We recommend that you truncate tables manually, rather than using this option. | ||
| --with-data-files | No | Truncates or clears tables with specified data files. | V3.1.0 | |
| --delete-from-table | No | Clears tables in the target database before the import. We recommend that you clear tables manually, rather than using this option. | ||
| -V(--version) | No | Shows the version of OBLOADER. | ||
| --no-sys | No | Specifies that the password of the sys tenant cannot be provided in OceanBase Database. | V3.3.0 | |
| --logical-database | No | Imports data by using ODP (sharding). | V3.3.0 | |
| --file-regular-expression | No | The regular expression for matching file names when you import data from one or more tables in a database. | V3.3.0
NoteOBLOADER V4.3.2 and later allow you to specify a regular expression to import data from multiple tables in a database. |
|
| --ignore-escape | No | Ignores escape operations on characters for the import of CUT format files. | V3.3.0 | |
| --character-set | No | The character set for creating a database connection. The default value is the value specified by the session variable jdbc.url.character.encoding in the session.properties file. The value of the --character-set option overrides that of the jdbc.url.character.encoding variable. Supported character sets include binary, gbk, gb18030, utf16, utf8mb4. |
V4.2.4 | |
| --strict | No | Specifies to control the impact of dirty data on the process end status during import. | V4.2.4 | |
| --session-config | No | Specifies the connection configuration file. | V4.2.6 | |
| -H(--help) | No | Shows the help information of the OBLOADER command-line tool. |