OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

Product Overview
DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

Product Overview
DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Database

SQL - V4.1.0

    Download PDF

    OceanBase logo

    The Unified Distributed Database for the AI Era.

    Follow Us
    Products
    OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
    Resources
    DocsBlogLive DemosTraining & CertificationTicket
    Company
    About OceanBaseTrust CenterLegalPartnerContact Us
    Follow Us

    © OceanBase 2026. All rights reserved

    Cloud Service AgreementPrivacy PolicySecurity
    Contact Us
    Document Feedback
    1. Documentation Center
    2. OceanBase Database
    3. SQL
    4. V4.1.0
    iconOceanBase Database
    SQL - V 4.1.0
    SQL
    KV
    • V 4.6.0
    • V 4.4.2
    • V 4.3.5
    • V 4.3.3
    • V 4.3.1
    • V 4.3.0
    • V 4.2.5
    • V 4.2.2
    • V 4.2.1
    • V 4.2.0
    • V 4.1.0
    • V 4.0.0
    • V 3.1.4 and earlier

    Use DataX to migrate table data from a MySQL database to OceanBase Database

    Last Updated:2023-10-20 06:33:11  Updated
    Share
    What is on this page
    Framework design
    mysqlreader plug-in
    oceanbasev10writer plug-in
    DataX configuration file
    Prepare the environment
    Use DataX to migrate data from a MySQL database to OceanBase Database
    More information

    folded

    Share

    DataX is an open-source version of Alibaba Cloud DataWorks. It is an offline data synchronization tool widely used by Alibaba Group. It efficiently synchronizes data between heterogeneous data sources such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, ADS, HBase, Table Store (OTS), MaxCompute (formerly known as ODPS), Distributed Relational Database Service (DRDS), and OceanBase Database.

    If you use OceanBase Database Enterprise Edition, you can request the internal version of DataX (RPM package) from Technical Support of OceanBase Database. If you use OceanBase Database Community Edition, you can download the source code from the open source website of DataX and then compile the code. During compilation, remove unused database plug-ins from the pom.xml file. Otherwise, the compiled package will be very large.

    Framework design

    DataX

    DataX is an offline data synchronization framework that is designed based on the framework + plug-in architecture. Data source reads and writes are abstracted as the reader and writer plug-ins and are integrated into the entire framework.

    • The reader plug-in is a data collection module that collects data from a data source and sends the data to the framework.

    • The writer plug-in is a data write module that retrieves data from the framework and writes the data to the destination.

    • The framework builds a data transmission channel to connect the reader and the writer and processes core technical issues such as caching, throttling, concurrency control, and data conversion.

    DataX migrates data through tasks. Each task migrates only one table and has a configuration file in JSON format. The configuration file contains two sections: reader and writer. reader and writer respectively correspond to the database read and write plug-ins supported by DataX. For example, when you migrate table data from a MySQL database to an OceanBase database, the mysqlreader plug-in of MySQL and the oceanbasev10writer plug-in of OceanBase Database are used to respectively read data from the MySQL database and write the data to the OceanBase database. The following sections describe the mysqlreader and oceanbasev10writer plug-ins.

    mysqlreader plug-in

    The mysqlreader plug-in reads data from a MySQL database. mysqlreader connects to a remote MySQL database through JDBC and executes the corresponding SQL statements to select data from the MySQL database.

    mysqlreader connects to a remote MySQL database through JDBC, generates a query statement based on the configured information, and sends the statement to the remote MySQL database. The remote MySQL database assembles the execution result of the SQL statement into an abstract dataset by using the custom data types of DataX and passes the dataset to the downstream writer for processing.

    For more information about the features and parameters, see mysqlreader plug-in.

    oceanbasev10writer plug-in

    The oceanbasev10writer plug-in writes data to the destination table in the OceanBase database. oceanbasev10writer connects to a remote OceanBase database from a Java client (MySQL JDBC or OBClient) by using ODP and executes the corresponding SQL INSERT statement to write the data to the remote OceanBase database. The data is committed to the remote OceanBase database in batches.

    Oceanbasev10Writer uses the DataX framework to obtain the protocol data generated by the reader and then generates an insert statement. If a primary key or unique key conflict occurs when data is written, you can update all fields in the table by using the replace mode for a MySQL tenant of OceanBase Database, and only the insert mode for an Oracle tenant of OceanBase Database. For performance purposes, the batch write mode is used. A write request is initiated only when the number of rows reaches the specified threshold.

    DataX configuration file

    Example:

    {
      "job": {
        "content": [
          {
            "reader": {
              "name": "streamreader",
              "parameter": {
                "sliceRecordCount": 10,
                "column": [
                  {
                    "type": "long",
                    "value": "10"
                  },
                  {
                    "type": "string",
                    "value": "hello, world-DataX"
                  }
                ]
              }
            },
            "writer": {
              "name": "streamwriter",
              "parameter": {
                "encoding": "UTF-8",
                "print": true
              }
            }
          }
        ],
        "setting": {
          "speed": {
            "channel": 2
           }
        }
      }
    }
    

    Notice

    DataX migrates only the data of a table. Therefore, you must create the schema of the table in the destination database in advance.

    Place the JSON configuration file in the job directory of DataX or in a custom path. Run the following command:

    $bin/datax.py job/stream2stream.json
    

    Output:

    <.....>
    
    2021-08-26 11:06:09.217 [job-0] INFO  JobContainer - PerfTrace not enable!
    2021-08-26 11:06:09.218 [job-0] INFO  StandAloneJobContainerCommunicator - Total 20 records, 380 bytes | Speed 38B/s, 2 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
    2021-08-26 11:06:09.223 [job-0] INFO  JobContainer -
    Task start time                 : 2021-08-26 11:05:59
    Task end time                   : 2021-08-26 11:06:09
    Time consumption                : 10s
    Average task traffic            : 38 B/s
    Record writing speed            : 2rec/s
    Total number of read records    : 20
    Total read and write failures   : 0
    

    After DataX executes a task, it generates a simple task report that covers the preceding average traffic, write speed, and total number of read/write failures.

    You can specify the speed and error record limit in the job parameter settings of DataX.

    "setting": {
                "speed": {
                    "channel": 10
                },
                "errorLimit": {
                    "record": 10,
                    "percentage": 0.1
                }
            }
    

    Parameters:

    • errorLimit: the limit on the number of error records. When this limit is exceeded, the task is terminated.
    • channel: the concurrency. Technically, a higher concurrency value indicates higher migration performance. In actual operations, you must also consider the read pressure on the source database, network transmission performance, and write performance of the destination database.

    Prepare the environment

    Download the .tar package from http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz.

    Decompress the installation package:

    tar zxvf datax.tar.gz
    cd datax
    

    The directories are as follows:

    $tree -L 1 --filelimit 30
    .
    ├── bin
    ├── conf
    ├── job
    ├── lib
    ├── log
    ├── log_perf
    ├── plugin
    ├── script
    └── tmp
    

    The following table describes the directories in the installation package.

    Directory name
    Description
    bin The directory where the executable file is located. The datax.py file in this directory is the startup script of DataX tasks.
    conf The directory where log files are located. The DataX configuration files unrelated to tasks are stored in this directory.
    lib The directory where the libraries required for running are located. The global .jar files required for the running of DataX are stored in this directory.
    job The directory where the task configuration file for verifying DataX installation is located.
    log The directory where log files are located. The running logs of DataX tasks are stored in this directory. By default, when DataX runs, standard logs are generated and written to the log directory.
    plugin The directory where the plug-in files are located. The data source plug-ins supported by DataX are stored in this directory.

    Use DataX to migrate data from a MySQL database to OceanBase Database

    When you migrate data from a MySQL database to an OceanBase database, if the source and destination databases cannot concurrently connect to the DataX server, you can export the data as CSV files and then import the CSV files to the destination database. If the source and destination databases can concurrently connect to the DataX server, you can use DataX to directly migrate data from the source to the destination.

    Example: Migrate the data of the tpccdb.bmsql_order table in a MySQL database to the tpcc.bmsql_order table in a MySQL tenant of OceanBase Database.

    Content of the myjob.json configuration file:

    {
        "job": {
            "setting": {
                "speed": {
                    "channel": 4
                },
                "errorLimit": {
                    "record": 0,
                    "percentage": 0.1
                }
            },
            "content": [
                {
                    "reader": {
                        "name": "mysqlreader",
                        "parameter": {
                            "username": "tpcc",
                            "password": "********",
                            "column": ["*"],
                            "connection": [
                                {
                                    "table": ["bmsql_order"],
                                    "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/tpccdb?useUnicode=true&characterEncoding=utf8"]
                                }
                            ]
                        }
                    },
                    "writer": {
                        "name": "oceanbasev10writer",
                        "parameter": {
                            "obWriteMode": "insert",
                            "column": ["*"],
                            "preSql": ["truncate table bmsql_order"],
                            "connection": [
                                {
                                    "jdbcUrl": "jdbc:oceanbase://127.0.0.1:2883/tpcc?",
                                    "table": ["bmsql_order"]
                                }
                            ],
                            "username": "tpcc",
                            "password":"********",
                            "writerThreadCount":10,
                            "batchSize": 1000,
                            "memstoreThreshold": "0.9"
                        }
                    }
                }
            ]
        }
    }
    
    Parameter
    Description
    name The name of the reader or writer database plug-in for connecting to the database. The reader plug-in of MySQL is mysqlreader, and the writer plug-in of OceanBase Database is oceanbasev10writer. For more information about the reader and writer plug-ins, see DataX data source guide.
    jdbcUrl The JDBC URL of the database to which you want to connect. The value is a JSON array and multiple URLs can be entered for a database. You need to enter at least one JDBC URL in the JSON array. The value must be entered in compliance with the MySQL official format. You can also specify a configuration property in the URL. For more information, see Configuration Properties in the MySQL documentation.
    Notice
  • The JDBC URL must be included in the connection section of the code.
  • You must connect to OceanBase Database by using ODP. The default port is 2883.
  • The JDBC URL of the writer does not need to be enclosed with square brackets ([]) but the JDBC URL of the reader must be enclosed with square brackets ([]).
  • Required: Yes.
  • Default value: None.
  • username The username for logging on to the database.
  • Required: Yes.
  • Default value: None.
  • password The password of the specified username required to log on to the database.
  • Required: Yes.
  • Default value: None.
  • table The table to be synchronized. The value is a JSON array and multiple tables can be specified at the same time. When you specify multiple tables, make sure that they use the same schema structure. The MySQL Reader does not verify whether the specified tables belong to the same logic table.
    Notice
    The table string must be included in the connection section of the code.
  • Required: Yes.
  • Default value: None.
  • column The set of names of columns to be synchronized in the configured table. The values are specified in a JSON array. We recommend that you do not set the column parameter to ['*'], because this configuration changes when the schema changes. We recommend that you specify specific column names. Column pruning is supported. You can export only the specified columns. Column reordering is supported. You can export columns without following the column order in the table schema. You can specify constants in the MySQL SQL format: ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"].
    Note
    • id is a regular column name.
    • table is the name of the column that includes a reserved word.
    • 1 is an integer constant.
    • bazhen.csy is a string constant.
    • null is a null pointer.
    • to_char(a + 1) is an expression.
    • 2.3 is a floating point number.
    • true is a Boolean value.
  • Required: Yes.
  • Default value: None.
  • where The filter condition. The MySQL reader assembles the specified column, table, and WHERE clause into an SQL statement. Then, the MySQL reader extracts data based on this SQL statement. To synchronize data of the current day, you can specify the WHERE clause asgmt_create > $bizdate.
    Notice
    You cannot set the WHERE clause to limit 10, because limit is not a valid WHERE clause of an SQL statement. A WHERE clause allows you to orderly synchronize the incremental business data. If you do not specify the WHERE clause or do not specify the key or value of the WHERE clause, DataX performs full synchronization.
  • Required: No.
  • Default value: None.
  • After the job configuration file is configured, execute this job.

    python datax.py ../job/myjob.json
    

    More information

    For more information about DataX, see DataX.

    Previous topic

    Use DBCAT to migrate schemas from a MySQL database to OceanBase Database
    Last

    Next topic

    Use CloudCanal to migrate data from a MySQL database to OceanBase Database
    Next
    What is on this page
    Framework design
    mysqlreader plug-in
    oceanbasev10writer plug-in
    DataX configuration file
    Prepare the environment
    Use DataX to migrate data from a MySQL database to OceanBase Database
    More information