This topic describes how to migrate data to OceanBase Cloud. It provides specific solutions for various data sources:
OceanBase Cloud:
Migrate Data and OBLOADER for large-scale data migration.
Standard SQL statements
LOAD DATAandINSERTfor easy and small-scale data import.SQL statement
CREATE EXTERNAL TABLEfor flexible queries on external data.
Third-party integrated tools:
Flink OceanBase Connector, DataX OceanBase Writer Plugin, Chunjun, and Canal.
This topic describes the most suitable scenarios for each solution and how to choose the most appropriate migration method based on the data file format and the features of the target storage system. This topic aims to familiarize you with a comprehensive framework of migration tools and methods to help you migrate data to OceanBase Cloud for subsequent business analytics and application development.
Migrate Data
Features:
Online migration without downtime, business applications are unaware.
High-performance data migration, safe and reliable.
One-stop migration.
Scenarios:
- Migrate Data is suitable for large-scale data migration between databases.
References:
- For more information about Migrate Data, see Migrate Data documentation.
Procedure for data migration and assessment
This section describes how to do data migration in the OceanBase Cloud console.
(Optional) Do compatibility assessment.
Before data migration, you can create a compatibility assessment task to analyze the database objects and SQL usage and generate a compatibility assessment report. The compatibility assessment service provides automated transformation solutions to reduce the difficulties and reconstruction costs in migration to cloud. For more information, see the Assess Compatibility.
Start data migration.
You can use the data migration service for online database migration with a few clicks. This service provides a quick and effortless business switchover. For more information, see the Migrate Data.
(Optional) Do performance assessment.
After data migration, you can do thorough performance assessment before switchover or use the business workload for traffic playback to identify performance risks in advance. The performance assessment service also provides an optimization plan to help reduce business risks. For more information, see the Assess Performance.
OBLOADER
Features:
Allows you to import database object definitions and table data from local disks, Network Attached Storage (NAS), Hadoop Distributed File System (HDFS), Alibaba Cloud OSS, Tencent Cloud COS, Huawei Cloud OBS, Apache Hadoop, and Amazon S3.
Allows you to import files in the INSERT SQL format that are exported by mysqldump.
Allows you to import data files in the standard CSV, INSERT SQL, Apache ORC, and Apache Parquet formats.
Allows you to set data preprocessing rules and configure field mappings between files and tables.
Supports features such as import throttling, memory exhaustion prevention, resumption after an interruption, and automatic retries.
Allows you to specify a custom log directory to store bad data and conflicting data during import.
Automatically splits large files without consuming additional storage space.
Supports encryption of sensitive parameters specified in commands, such as the database account and password and the cloud storage account and password.
OBLOADER supports these write modes:
Client mode: allows you to insert data into OceanBase Cloud through JDBC or direct load.
Server mode: allows you to use the
LOAD DATAstatement to import data to OceanBase Cloud.
Scenarios:
- OBLOADER is suitable for large-scale data import.
References:
- For more information about OBLOADER, see OBLOADER documentation.
LOAD DATA
LOAD DATA LOCAL
Features:
Allows you to import local files to OceanBase Cloud. Allows you to import and insert local files to OceanBase clusters as network streams.
Supports importing a small amount of data at a time.
Supports importing only one file at a time.
Allows you to import CSV, SQL, and Parquet files.
Scenarios:
- The
LOAD DATA LOCALstatement is suitable for small-scale data import.
References:
- For more information about the
LOAD DATA LOCAL, see Import data by using the LOAD DATA statement.
LOAD DATA FROM OSS
Features:
Allows you to import files from Alibaba Cloud OSS to OceanBase Cloud.
Allows you to import multiple files from Alibaba Cloud OSS at a time.
Supports importing only CSV files.
Scenarios:
- The
LOAD DATA FROM OSSstatement is suitable for large-scale data import.
References:
- For more information about
LOAD DATA FROM OSS, see LOAD DATA (Oracle compatible mode) and LOAD DATA (MySQL compatible mode).
INSERT SQL
Scenarios:
The
INSERT INTO VALUESstatement is suitable for writing a small amount of data to an internal table.The
INSERT INTO SELECT FROM <table_name>statement is suitable for writing the query result of another internal or external table to the target table. In other words, it is suitable for data migration between tables.The
INSERT /*+ [APPEND |direct(need_sort,max_error,'full')] enable_parallel_dml parallel(N) */ INTO table_name select_sentencestatement is suitable for full and incremental direct load.
References:
- For more information about INSERT, see INSERT (Oracle compatible mode) and INSERT (MySQL compatible mode).
CREATE EXTERNAL TABLE
Scenarios:
- External tables are a key feature in a database management system. Generally, tables of a database is stored in the storage space of the database, while an external table has data stored in an external storage service.
References:
- For more information about external tables, see Overview (Oracle compatible compatiblemode) and Overview (MySQL compatible mode).
Flink
Scenarios:
- Flink OceanBase Connector is suitable for importing data from Flink in real time.
References:
Canal
Scenarios:
- Canal is suitable for importing data to OceanBase Cloud in real time.
References:
Kafka
Scenarios:
- Kafka is suitable for migrating or synchronizing data from databases such as MySQL, Oracle, and PostgreSQL to OceanBase Database..
References:
DataX
Scenarios:
- DataX is suitable for synchronizing data between various data sources, including relational databases such as MySQL and Oracle, HDFS, and Hive. DataX allows you to migrate data from one table at a time, without limiting the amount of data. It can migrate up to 60 MB of data per second.
References:
CloudCanal
Scenarios:
- CloudCanal is suitable for migrating or synchronizing data from MySQL, Oracle, and PostgreSQL databases to OceanBase Cloud.
References:
SeaTunnel
Scenarios:
- Suitable for migrating or synchronizing data from databases such as MySQL, Oracle, and PostgreSQL to OceanBase Database.
References:
Choose an import solution
This section describes the import solutions supported by common data sources to help you quickly choose an appropriate import solution based on your business scenario.
Object storage services
The following table describes the solutions for importing data from object storage services of cloud service providers to OceanBase Cloud.
| Data source | Supported import solution |
|---|---|
| Alibaba Cloud OSS |
|
| Tencent Cloud COS |
|
| Huawei Cloud OBS |
|
| Amazon S3 |
|
| Azure Blob Storage | Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Cloud. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data. |
| Google Cloud GCS | Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Cloud. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data. |
File systems
The following table describes the solutions for importing data from local or distributed file systems to OceanBase Cloud.
| Data source | Supported import solution |
|---|---|
| Local file system (NFS and NAS) |
|
| Hadoop Distributed File System (HDFS) |
|
Procedure:
Perform data assessment and make preparations.
Understand the data formats: Data can be stored in the CSV, JSON, XML, binary, or another format in the file system.
Define schemas: Specify the schemas of target tables in OceanBase Cloud, and determine the data to be migrated and whether the data needs to be converted.
Extract data.
Direct access: If your OBServer node can directly access the mount point of the file system, you can directly specify the file path in the SQL statement, for example,
LOAD DATA INFILE, to import a CSV file.Data replication: If your OBServer node cannot directly access the mount point of the file system, you can replicate data from the file to a local disk or temporary storage accessible to your OBServer node.
(Optional) Convert data.
Preprocessing: Preprocess data based on the schemas of target tables in OceanBase Cloud. For example, you can write scripts or use ETL tools to convert data formats, cleanse data (remove invalid or inconsistent records), or change date and time formats.
Conversion tools: You can use scripting languages such as Python and Pandas or ETL tools to convert data.
Load data.
LOAD DATA INFILE: You can use theLOAD DATA INFILEstatement of OceanBase Cloud to import a large amount of data at a time. Make sure that you have the privilege to read data from the specified file path.MySQL client: You can the mysql-connector-python client to batch insert data through programming.
Considerations:
Access privileges: Make sure that you have the privileges to read data from NFS or NAS and write data to MySQL databases.
Data security: Ensure data security and privacy and observe applicable data protection regulations and laws during the migration.
Backup: Back up the original data and target database before you start any migration task.
Streaming systems
The following table describes the solutions for importing data from streaming systems to OceanBase Cloud.
| Data source | Supported import solution |
|---|---|
| Flink |
|
| Canal |
|
| Spark | JDBC (Click here for more information.) |
Databases
The following table describes the solutions for importing data from other databases to OceanBase Cloud.
| Data source | Supported import solution |
|---|---|
| MySQL database |
|
| Oracle database |
|
| PostgreSQL database |
|
| TiDB |
|
| SQL Server |
|
| StarRocks | Offline data migration: Dump data from StarRocks, and then use OBLOADER or the LOAD DATA statement to import data to OceanBase Cloud. |
| Doris | Offline data migration: Dump data from Doris, and then use OBLOADER or the LOAD DATA statement to import data to OceanBase Cloud. |
| HBase |
|
| MaxCompute | Offline data migration: Dump data from MaxCompute, and then use OBLOADER or the LOAD DATA statement to import data to OceanBase Cloud. |
| Hologres | Offline data migration: Dump data from Hologres, and then use OBLOADER or the LOAD DATA statement to import data to OceanBase Cloud. |