Overview|V4.3.1| docs|Distributed Database

Overview

Last Updated：2024-07-11 11:22:29 Updated

Data circulation is one of the most critical parts of an analytics-oriented business scenario. This topic describes how to migrate external data to OceanBase Database. OceanBase Database supports data import from the following data sources:

Databases

MySQL, Oracle, SQL Server, PostgreSQL, and MongoDB databases

Scenarios: seamless integration to retain the structure and integrity of data
Offline data files

CSV, JSON, and XML files

Scenarios: flexible batch data processing and non-real-time analytics
Object storage services

Amazon Simple Storage Service (S3), Google Cloud Storage (GCS), Azure storage service, Alibaba Cloud Object Storage Service (OSS), Tencent Cloud Object Storage (COS), and Huawei Cloud Object Storage Service (OBS).

Scenarios: real-time or near-real-time data access to support dynamic data analytics
Real-time data streams

Flink and Canal

Scenarios: real-time processing and analytics of continuously generated data streams in business scenarios that require quick response

You can choose one of the following data migration tools based on your data source and scenario:

OceanBase solutions:
- OceanBase Migration Service (OMS) and OBLOADER, which are designed for large-scale data migration.
- Standard SQL statements LOAD DATA and INSERT, which are easy to use and suitable for small-scale data import.
- SQL statement CREATE EXTERNAL TABLE, which allows you to directly query external data files to improve the flexibility of analytics.
Third-party integrated tools: Flink OceanBase Connector, DataX OceanBase Writer Plugin, Chunjun, and Canal
Standard SDK APIs: Java Database Connectivity (JDBC) and OBKV, which allow you to flexibly access relational data sources and store key-value pairs.

This topic describes the most suitable scenarios for each solution and how to choose the most appropriate migration method based on the data file format and the characteristics of the target storage system. This topic aims to familiarize you with a comprehensive framework of migration tools and methods to help you migrate data to OceanBase Database for subsequent business analytics and application development.

Import solutions

This section describes the characteristics and scenarios of import solutions.

OMS

Characteristics:

Supports multiple types of data sources.
Supports online migration of existing data.
Supports real-time synchronization of incremental data.
Supports data verification from multiple dimensions.
Provides high-performance, secure, and reliable data migration.
Provides real-time data migration without interrupting or affecting business applications.

Scenarios:

OMS is suitable for large-scale data migration between databases.

References:

For more information about OMS, see OMS documentation.

LOAD DATA

LOAD DATA LOCAL

Characteristics:

Allows you to import local files to OceanBase Database. Allows you to import and insert local files to OceanBase clusters as network streams.
Supports importing only a small amount of data at a time.
Supports importing only one file at a time.
Allows you to import CSV, SQL, and Parquet files.

Scenarios:

The LOAD DATA LOCAL statement is suitable for small-scale data import.

References:

For more information about the LOAD DATA LOCAL statement, see Import data by using the LOAD DATA statement.

LOAD DATA FROM OSS

Characteristics:

Allows you to import files from Alibaba Cloud OSS to OceanBase Database without using OceanBase Database Proxy (ODP).
Allows you to import multiple files from Alibaba Cloud OSS at a time.
Supports importing only CSV files.

Scenarios:

The LOAD DATA FROM OSS statement is suitable for large-scale data import.

References:

For more information about the LOAD DATA FROM OSS statement, see LOAD DATA (Oracle mode) and LOAD DATA (MySQL mode).

OBLOADER

Characteristics:

Allows you to import database object definitions and table data from local disks, Network Attached Storage (NAS), Hadoop Distributed File System (HDFS), Alibaba Cloud OSS, Tencent Cloud COS, Huawei Cloud OBS, Apache Hadoop, and Amazon S3.
Allows you to import files in the INSERT SQL format that are exported by mysqldump.
Allows you to import data files in the standard CSV, INSERT SQL, Apache ORC, and Apache Parquet formats.
Allows you to set data preprocessing rules and configure field mappings between files and tables.
Supports features such as import throttling, memory exhaustion prevention, resumption after an interruption, and automatic retries.
Allows you to specify a custom log directory to store bad data and conflicting data during import.
Automatically splits large files without consuming additional storage space.
Supports encryption of sensitive parameters specified in commands, such as the database account and password and the cloud storage account and password.

OBLOADER supports the following write modes:

Client mode: allows you to insert data into OceanBase Database through JDBC or direct load.
Server mode: allows you to use the LOAD DATA statement to import data to OceanBase Database.

Scenarios:

OBLOADER is suitable for large-scale data import.

References:

For more information about OBLOADER, see OBLOADER documentation.

INSERT statements

Scenarios:

The INSERT INTO VALUES statement is suitable for writing a small amount of data to an internal table.
The INSERT INTO SELECT FROM <table_name> statement is suitable for writing the query result of another internal or external table to the destination table. In other words, it is suitable for data migration between tables.
The INSERT /*+ [APPEND |direct(need_sort,max_error,'full')] enable_parallel_dml parallel(N) */ INTO table_name select_sentence statement is suitable for full and incremental direct load.

References:

For more information about INSERT statements, see INSERT (Oracle mode) and INSERT (MySQL mode).

CREATE EXTERNAL TABLE

Scenarios:

External tables are a key feature in a database management system. Generally, the data of a table in a database is stored in the storage space of the database, while an external table has data stored in an external storage service.

References:

For more information about external tables, see Overview (Oracle mode) and Overview (MySQL mode).

Flink

Scenarios:

Flink OceanBase Connector is suitable for importing data from Flink in real time.

References:

Use Flink CDC to synchronize data from a MySQL database to OceanBase Database

Canal

Scenarios:

Canal is suitable for importing data to OceanBase Database in real time.

References:

Use Canal to synchronize data from a MySQL database to OceanBase Database

DataX

Scenarios:

DataX is suitable for synchronizing data between various data sources, including relational databases such as MySQL and Oracle, HDFS, and Hive. DataX allows you to migrate data from one table at a time, without limiting the amount of data. It can migrate up to 60 MB of data per second.

References:

Use DataX to migrate table data from a MySQL database to OceanBase Database

CloudCanal

Scenarios:

CloudCanal is suitable for migrating or synchronizing data from MySQL, Oracle, and PostgreSQL databases to OceanBase Database.

References:

Use CloudCanal to migrate data from a MySQL database to OceanBase Database

Choose an import solution

This section describes the import solutions supported by common data sources to help you quickly choose an appropriate import solution based on your business scenario.

Object storage services

The following table describes the solutions for importing data from object storage services of cloud service providers to OceanBase Database.

Data source	Supported import solution
Alibaba Cloud OSS	OBLOADER (Click here for more information.) DataX (Click here for more information.) LOAD DATA INFILE(Click here for more information.) Download the data to a local or accessible server. Then use a MySQL command-line interface (CLI) tool or SQL management tool to import the data to OceanBase Database. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data.
Tencent Cloud COS	OBLOADER (Click here for more information.) LOAD DATA INFILE (Click here for more information.) Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Database. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data.
Huawei Cloud OBS	LOAD DATA INFILE (Click here for more information.) Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Database. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data. Flink CDC (Click here for more information.)
Amazon S3	OBLOADER (Click here for more information.) Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Database. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data.
Azure Blob Storage	Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Database. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data.
Google Cloud GCS	Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Database. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data.

File systems

The following table describes the solutions for importing data from local or distributed file systems to OceanBase Database.

Data source	Supported import solution
Local file system (NFS and NAS)	If the data file is in the CSV format, use OBLOADER. (Click here for more information.) If the data file is in the SQL format, use the `LOAD DATA INFILE` statement. (Click here for more information.) Write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data.
Hadoop Distributed File System (HDFS)	Import the data to a TXT file, and then use OBLOADER to import the file. (Click here for more information.) Write custom extract-transform-load (ETL) scripts, and read data from HDFS through an HDFS API. Then use a MySQL client such as JDBC or mysql-connector-python to convert the data and import it to OceanBase Database.

Procedure:

Perform data assessment and make preparations.
- Understand the data formats: Data can be stored in the CSV, JSON, XML, binary, or another format in the file system.
- Define schemas: Specify the schemas of destination tables in OceanBase Database, and determine the data to be migrated and whether the data needs to be converted.
Extract data.
- Direct access: If your OBServer node can directly access the mount point of the file system, you can directly specify the file path in the SQL statement, for example, LOAD DATA INFILE, to import a CSV file.
- Data replication: If your OBServer node cannot directly access the mount point of the file system, you can replicate data from the file to a local disk or temporary storage accessible to your OBServer node.
(Optional) Convert data.
- Preprocessing: Preprocess data based on the schemas of destination tables in OceanBase Database. For example, you can write scripts or use ETL tools to convert data formats, cleanse data (remove invalid or inconsistent records), or change date and time formats.
- Conversion tools: You can use scripting languages such as Python and Pandas or ETL tools to convert data.
Load data.
- LOAD DATA INFILE: You can use the LOAD DATA INFILE statement of OceanBase Database to import a large amount of data at a time. Make sure that you have the privilege to read data from the specified file path.
- MySQL client: You can the mysql-connector-python client to batch insert data through programming.

Considerations:

Access privileges: Make sure that you have the privileges to read data from NFS or NAS and write data to MySQL databases.
Data security: Ensure data security and privacy and observe applicable data protection regulations and laws during the migration.
Backup: Back up the original data and destination database before you start any migration task.

Streaming systems

The following table describes the solutions for importing data from streaming systems to OceanBase Database.

Data source	Supported import solution
Flink	MySQL CDC Connector (Click here for more information.) Flink JDBC SQL Connector (Click here for more information.) Flink OceanBase Connector (Click here for more information.)
Canal	Canal Server (Click here for more information.) Canal Adapter (Click here for more information.)
Spark	JDBC (Click here for more information.)

Databases

The following table describes the solutions for importing data from other databases to OceanBase Database.

Data source	Supported import solution
MySQL database	Online data migration: Use OMS to migrate data to OceanBase Database. DataX. Offline data migration: Dump data from the MySQL database, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database. The solutions are similar to those for migrating data from file systems.
Oracle database	Online data migration: Use OMS to migrate data to OceanBase Database. DataX. Offline data migration: Dump data from the Oracle database, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.
PostgreSQL database	Online data migration: Use OMS to migrate data to OceanBase Database. Offline data migration: Dump data from the PostgreSQL database, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.
TiDB	Online data migration: Use OMS to migrate data to OceanBase Database. Offline data migration: Dump data from TiDB, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.
SQL Server	DataX. Offline data migration: Dump data from SQL Server, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.
StarRocks	Offline data migration: Dump data from StarRocks, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.
Doris	Offline data migration: Dump data from Doris, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.
HBase	DataX. Offline data migration: Dump data from HBase, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.
MaxCompute	Offline data migration: Dump data from MaxCompute, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.
Hologres	Offline data migration: Dump data from Hologres, and then use OBLOADER or the `LOAD DATA` statement to import data to OceanBase Database.