Data circulation is one of the most critical parts of an analytics-oriented business scenario. This topic describes how to migrate external data to OceanBase Database. OceanBase Database supports data import from the following data sources:
Databases
MySQL, Oracle, SQL Server, PostgreSQL, and MongoDB databases
Scenarios: seamless integration to retain the structure and integrity of data
Offline data files
CSV, JSON, and XML files
Scenarios: flexible batch data processing and non-real-time analytics
Object storage services
Amazon Simple Storage Service (S3), Google Cloud Storage (GCS), Azure storage service, Alibaba Cloud Object Storage Service (OSS), Tencent Cloud Object Storage (COS), and Huawei Cloud Object Storage Service (OBS).
Scenarios: real-time or near-real-time data access to support dynamic data analytics
Real-time data streams
Flink and Canal
Scenarios: real-time processing and analytics of continuously generated data streams in business scenarios that require quick response
You can choose one of the following data migration tools based on your data source and scenario:
- OceanBase solutions:
- OceanBase Migration Service (OMS) and OBLOADER, which are designed for large-scale data migration.
- Standard SQL statements
LOAD DATAandINSERT, which are easy to use and suitable for small-scale data import. - SQL statement
CREATE EXTERNAL TABLE, which allows you to directly query external data files to improve the flexibility of analytics.
- Third-party integrated tools: Flink OceanBase Connector, DataX OceanBase Writer Plugin, Chunjun, and Canal
- Standard SDK APIs: Java Database Connectivity (JDBC) and OBKV, which allow you to flexibly access relational data sources and store key-value pairs.
This topic describes the most suitable scenarios for each solution and how to choose the most appropriate migration method based on the data file format and the characteristics of the target storage system. This topic aims to familiarize you with a comprehensive framework of migration tools and methods to help you migrate data to OceanBase Database for subsequent business analytics and application development.
Import solutions
This section describes the characteristics and scenarios of import solutions.
OMS
Characteristics:
- Supports multiple types of data sources.
- Supports online migration of existing data.
- Supports real-time synchronization of incremental data.
- Supports data verification from multiple dimensions.
- Provides high-performance, secure, and reliable data migration.
- Provides real-time data migration without interrupting or affecting business applications.
Scenarios:
- OMS is suitable for large-scale data migration between databases.
References:
- For more information about OMS, see OMS documentation.
LOAD DATA
LOAD DATA LOCAL
Characteristics:
- Allows you to import local files to OceanBase Database. Allows you to import and insert local files to OceanBase clusters as network streams.
- Supports importing only a small amount of data at a time.
- Supports importing only one file at a time.
- Allows you to import CSV, SQL, and Parquet files.
Scenarios:
- The
LOAD DATA LOCALstatement is suitable for small-scale data import.
References:
- For more information about the
LOAD DATA LOCALstatement, see Import data by using the LOAD DATA statement.
LOAD DATA FROM OSS
Characteristics:
- Allows you to import files from Alibaba Cloud OSS to OceanBase Database without using OceanBase Database Proxy (ODP).
- Allows you to import multiple files from Alibaba Cloud OSS at a time.
- Supports importing only CSV files.
Scenarios:
- The
LOAD DATA FROM OSSstatement is suitable for large-scale data import.
References:
- For more information about the
LOAD DATA FROM OSSstatement, see LOAD DATA (Oracle mode) and LOAD DATA (MySQL mode).
OBLOADER
Characteristics:
- Allows you to import database object definitions and table data from local disks, Network Attached Storage (NAS), Hadoop Distributed File System (HDFS), Alibaba Cloud OSS, Tencent Cloud COS, Huawei Cloud OBS, Apache Hadoop, and Amazon S3.
- Allows you to import files in the INSERT SQL format that are exported by mysqldump.
- Allows you to import data files in the standard CSV, INSERT SQL, Apache ORC, and Apache Parquet formats.
- Allows you to set data preprocessing rules and configure field mappings between files and tables.
- Supports features such as import throttling, memory exhaustion prevention, resumption after an interruption, and automatic retries.
- Allows you to specify a custom log directory to store bad data and conflicting data during import.
- Automatically splits large files without consuming additional storage space.
- Supports encryption of sensitive parameters specified in commands, such as the database account and password and the cloud storage account and password.
OBLOADER supports the following write modes:
- Client mode: allows you to insert data into OceanBase Database through JDBC or direct load.
- Server mode: allows you to use the
LOAD DATAstatement to import data to OceanBase Database.
Scenarios:
- OBLOADER is suitable for large-scale data import.
References:
- For more information about OBLOADER, see OBLOADER documentation.
INSERT statements
Scenarios:
- The
INSERT INTO VALUESstatement is suitable for writing a small amount of data to an internal table. - The
INSERT INTO SELECT FROM <table_name>statement is suitable for writing the query result of another internal or external table to the destination table. In other words, it is suitable for data migration between tables. - The
INSERT /*+ [APPEND |direct(need_sort,max_error,'full')] enable_parallel_dml parallel(N) */ INTO table_name select_sentencestatement is suitable for full and incremental direct load.
References:
- For more information about INSERT statements, see INSERT (Oracle mode) and INSERT (MySQL mode).
CREATE EXTERNAL TABLE
Scenarios:
- External tables are a key feature in a database management system. Generally, the data of a table in a database is stored in the storage space of the database, while an external table has data stored in an external storage service.
References:
- For more information about external tables, see Overview (Oracle mode) and Overview (MySQL mode).
Flink
Scenarios:
- Flink OceanBase Connector is suitable for importing data from Flink in real time.
References:
Canal
Scenarios:
- Canal is suitable for importing data to OceanBase Database in real time.
References:
DataX
Scenarios:
- DataX is suitable for synchronizing data between various data sources, including relational databases such as MySQL and Oracle, HDFS, and Hive. DataX allows you to migrate data from one table at a time, without limiting the amount of data. It can migrate up to 60 MB of data per second.
References:
CloudCanal
Scenarios:
- CloudCanal is suitable for migrating or synchronizing data from MySQL, Oracle, and PostgreSQL databases to OceanBase Database.
References:
Choose an import solution
This section describes the import solutions supported by common data sources to help you quickly choose an appropriate import solution based on your business scenario.
Object storage services
The following table describes the solutions for importing data from object storage services of cloud service providers to OceanBase Database.
| Data source | Supported import solution |
|---|---|
| Alibaba Cloud OSS |
|
| Tencent Cloud COS |
|
| Huawei Cloud OBS |
|
| Amazon S3 |
|
| Azure Blob Storage | Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Database. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data. |
| Google Cloud GCS | Download the data to a local or accessible server. Then use a MySQL CLI tool or SQL management tool to import the data to OceanBase Database. You can also write scripts and execute an SQL statement by using a MySQL connector library to batch insert the data. |
File systems
The following table describes the solutions for importing data from local or distributed file systems to OceanBase Database.
| Data source | Supported import solution |
|---|---|
| Local file system (NFS and NAS) |
|
| Hadoop Distributed File System (HDFS) |
|
Procedure:
Perform data assessment and make preparations.
- Understand the data formats: Data can be stored in the CSV, JSON, XML, binary, or another format in the file system.
- Define schemas: Specify the schemas of destination tables in OceanBase Database, and determine the data to be migrated and whether the data needs to be converted.
Extract data.
- Direct access: If your OBServer node can directly access the mount point of the file system, you can directly specify the file path in the SQL statement, for example,
LOAD DATA INFILE, to import a CSV file. - Data replication: If your OBServer node cannot directly access the mount point of the file system, you can replicate data from the file to a local disk or temporary storage accessible to your OBServer node.
- Direct access: If your OBServer node can directly access the mount point of the file system, you can directly specify the file path in the SQL statement, for example,
(Optional) Convert data.
- Preprocessing: Preprocess data based on the schemas of destination tables in OceanBase Database. For example, you can write scripts or use ETL tools to convert data formats, cleanse data (remove invalid or inconsistent records), or change date and time formats.
- Conversion tools: You can use scripting languages such as Python and Pandas or ETL tools to convert data.
Load data.
LOAD DATA INFILE: You can use theLOAD DATA INFILEstatement of OceanBase Database to import a large amount of data at a time. Make sure that you have the privilege to read data from the specified file path.- MySQL client: You can the mysql-connector-python client to batch insert data through programming.
Considerations:
- Access privileges: Make sure that you have the privileges to read data from NFS or NAS and write data to MySQL databases.
- Data security: Ensure data security and privacy and observe applicable data protection regulations and laws during the migration.
- Backup: Back up the original data and destination database before you start any migration task.
Streaming systems
The following table describes the solutions for importing data from streaming systems to OceanBase Database.
| Data source | Supported import solution |
|---|---|
| Flink |
|
| Canal |
|
| Spark | JDBC (Click here for more information.) |
Databases
The following table describes the solutions for importing data from other databases to OceanBase Database.
| Data source | Supported import solution |
|---|---|
| MySQL database |
|
| Oracle database |
|
| PostgreSQL database |
|
| TiDB |
|
| SQL Server |
|
| StarRocks | Offline data migration: Dump data from StarRocks, and then use OBLOADER or the LOAD DATA statement to import data to OceanBase Database. |
| Doris | Offline data migration: Dump data from Doris, and then use OBLOADER or the LOAD DATA statement to import data to OceanBase Database. |
| HBase |
|
| MaxCompute | Offline data migration: Dump data from MaxCompute, and then use OBLOADER or the LOAD DATA statement to import data to OceanBase Database. |
| Hologres | Offline data migration: Dump data from Hologres, and then use OBLOADER or the LOAD DATA statement to import data to OceanBase Database. |