Use Canal to synchronize MySQL data to OceanBase Database in real time|V4.0.0| docs|Distributed Database

Canal is an open-source product of Alibaba. It parses incremental logs in MySQL databases and thereby provides incremental data subscription and consumption. The open-source project is available at https://github.com/alibaba/canal.

Architecture

Canal works with the following four components:

Canal Deployer: the Canal component that serves as a server and converts binlogs to CanalEntry data.
Canal Admin: the configuration administrator of Canal. It provides a web page for you to manage the Canal Deployer.
Canal Adapter: the client adapter of Canal. It parses the CanalEntry data and synchronizes the incremental changes to the destination database.
Canal Example: the client example of Canal. You can modify the sample code as needed to implement your consumption logic.

Canal

In the above figure, canal-admin is for deployment and is optional, canal-server is the Canal Deployer software, and adapter is the Canal Adapter software. The source database is a MariaDB`` database and the destination database is a MySQL tenant in OceanBase Database.

Canal Deployer

In OceanBase Database, Canal Deployer provides services by using Canal serversand instances. A Canal server is a deployer service that can have multiple instances, and an instance represents a data synchronization channel.``````

A canal instance is created by Spring during operation. The configuration information of the Canal instance is specified in the canal.properties file in the conf/ directory of the Canal Deployer. Canal provides several configuration files that are stored in the conf/spring directory. You can directly use these configuration files.

After a Canal instance parses binlogs into the CanalEntry data, it stores the CanalEntry data in the memory for later consumption. Canal provides two data consumption modes:

TCP mode: The destination database directly connects to Canal for data consumption.
MQ mode: The data in the Canal memory is written to a message queue (MQ), and the destination database connects to the MQ by using a client for data consumption.

Canal provides two types of position information. One is the parsing position, which is recorded during the conversion of a log into an entry and managed by LogPositionManager. The other is the client consumption position, which is managed by MetaManager. These two types of position information are also specified in the spring xml file of the Canal instance.

Canal Adapter

Canal Adapter consumes the CanalEntry data and writes it to the corresponding destination container. Similar to Canal Deployer, Canal Adapter also provides instances. Canal Adapter is started by the Adapter launcher and generates instances based on the configurations that you specified. The instances read the `CanalEntry``` data and write it to the destination database.

Canal Admin

Canal Admin is a service platform that simplifies the deployment of Canal. Both Canal servers and instances support standalone deployment and high-availability cluster deployment. `Canal Admin````` allows you to easily manage their deployment on a web page.

Deployment example

Prepare a MySQL database

``MariaDB is used here.

yum install mariadb mariadb-server

You must first enable Binlog writing and set the binlog-format parameter to ROW. The following example shows you how to configure the my.cnf file in the /etc directory:

[mysqld]
log-bin=mysql-bin # Enable the Binlog.
binlog-format=ROW # Set the binlog format to ROW.
server_id=1 # The server ID must be defined for the configuration of MySQL replication. It must be different from the slaveId of Canal.

Run the following commands to grant the MySQL slave privileges to the Canal user for connection with the MySQL database. If you already have a Canal user, directly run the grant command.

CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;

Then a business database is initialized for TPC-H benchmark tests, which will be described in detail in Chapter 6).

Deploy Canal Admin

Using Canal Admin to deploy Canal Deployer is easier, but it is not mandatory.

Canal Admin is used here.

Download Canal Admin at https://github.com/alibaba/canal/releases.

wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.admin-1.1.5.tar.gz

Run the following command to extract files to the specified directory:

mkdir ~/canal-admin && tar zxvf canal.admin-1.1.5.tar.gz -C ~/canal-admin/

Modify the configuration file.

cd ~/canal-admin && vim conf/application.yml

The following example shows how to modify the configuration file:

server:
  port: 8089
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
spring.datasource:
  address: 127.0.0.1:3306
  database: canal_manager
  username: canal
  password: ******
  driver-class-name: com.mysql.jdbc.Driver
  url: jdbc:mysql://${spring.datasource.address}/${spring.datasource.database}?useUnicode=true&characterEncoding=UTF-8&useSSL=false
  hikari:
    maximum-pool-size: 30
    minimum-idle: 1
canal:
  adminUser: admin
  adminPasswd: ******

To avoid connection failure caused by an incorrect password, we do not recommend that you change the password in the adminPasswd field during the first test.

Run the following command to initialize the MetaDB:

mysql -h127.1 -uroot -P3306 -p******
source conf/canal_manager.sql

The related tables are automatically created based on the script:

show tables;
MariaDB [tpch]> use canal_manager;
Database changed
MariaDB [canal_manager]> show tables;
+-------------------------+
| Tables_in_canal_manager |
+-------------------------+
| canal_adapter_config    |
| canal_cluster           |
| canal_config            |
| canal_instance_config   |
| canal_node_server       |
| canal_user              |
+-------------------------+
6 rows in set (0.00 sec)

Run the following command to start the web service:

cd ~/canal-admin && bin/startup.sh

In general cases, if the web service is started, port 8089 is listened to.

[root@obce00 adapter]# netstat -ntlp |grep 15973
tcp        0      0 0.0.0.0:8089            0.0.0.0:*               LISTEN      15973/java

If any problem occurs during the startup, check the logs.

vim logs/admin.log +

Log on to the web UI of Canal Admin.

Access webpage of Canal Admin: http://127.0.0.1:8089/

Username for logon: admin

Password for logon: ******

Deploy Canal Deployer

Download Canal at https://github.com/alibaba/canal/releases.

wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz

Run the following command to extract files to the specified directory:
```
mkdir ~/canal && tar zxvf canal.deployer-1.1.5.tar.gz  -C ~/canal
```
Modify configurations.

If you do not use Canal Admin to deploy the Canal server and instances, use the default configuration files conf/canal.properties and conf/example/instance.properties. These configuration files are provided to create a default instance named example. You need to modify the database connection address, username, and password of the example instance. The following sample code shows you how to modify the configuration file:
```
vi conf/example/instance.properties

# mysql serverId
canal.instance.mysql.slaveId = 1234
# Specify the position information of your database.
canal.instance.master.address = 127.0.0.1:3306
canal.instance.master.journal.name =
canal.instance.master.position =
canal.instance.master.timestamp =
#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
# Specify the username and password of your database.
canal.instance.dbUsername = canal  
canal.instance.dbPassword = ******
canal.instance.defaultDatabaseName =
canal.instance.connectionCharset = UTF-8
#table regex
canal.instance.filter.regex = .*\\..*
```
The canal.instance.connectionCharset parameter specifies the java encoding type corresponding to the database encoding method. Valid values: UTF-8, GBK, and ISO-8859-1. If the system has only one CPU core, set the canal.instance.parser.parallel parameter to false.

If you use Canal Admin to deploy Canal servers and instances, you must replace the content in the conf/canal.properties configuration file with the content in the conf/canal_local.properties configuration file and then modify the manager address in the conf/canal.properties file. You can retain the default values for other parameters.
```
[root@obce00 canal]# cat conf/canal.properties
# register ip
canal.register.ip =
# canal admin config
canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = ****************************************
# admin auto register
canal.admin.register.auto = true
canal.admin.register.cluster =
canal.admin.register.name =
```
Note The value of the passwd parameter is cyphertext of admin in the MySQL database. The password must be the same as the one specified in the Canal Admin configuration file. If you have changed the password in the Canal Admin configuration file, you must change this password accordingly. To obtain the cyphertext, run the following select password command in MySQL:
```
MariaDB [canal_manager]> select password('admin');
+-------------------------------------------+
| password('admin')                         |
+-------------------------------------------+
| ***************************************** |
+-------------------------------------------+
1 row in set (0.00 sec)
```
Start Canal Server

Regardless of the deployment method you choose, you can run the following command to start Canal Server after you modify the configuration files:
```
sh bin/startup.sh
```
(Optional) Use Canal Admin to deploy the Canal server and Canal instance

To use Canal Admin to manage Canal servers, log on to the Canal Admin console at http://xxx.xxx.xxx.xxx:8089/#/canalServer/nodeServers. In the left-side navigation pane, choose Canal Server > Server Management. Then, click Create Server.

Note

The IP address in the preceding link is the IP address of the server where Canal Admin is deployed. Replace it with the actual IP address of the server where Canal Admin is deployed.

In the New Server Information dialog box, click OK.

In the left-side navigation pane, choose Canal Server > Instance Management, and then click Create Instance on the Instance Management page.

On the New Instance Configuration page, click Load Template. The instance configuration file appears, which is similar to the one described in the Deploy Canal Deployer section. You need to modify configurations related to Canal and the source database in the configuration file.

Click Save and name the instance as mariadb.

The status of the new instance is Stopped. You can click Start in the Actions column to start the instance. After the instance is started, the status changes to Started.
View server logs

You can view the server logs either in Canal Admin or in the CLI by running the following command:
```
vi logs/canal/canal.log
```
View instance logs

You can view the instance logs either in Canal Admin or in the CLI by running the following commands:
```
tail -f logs/canal/canal.log
tail -f logs/example/example.log
tail -f logs/mariadb/mariadb.log
```
Run the following command to stop the service:
```
sh bin/stop.sh
```

Deploy the RDB adapter

Canal Adapter supports multiple containers. For OceanBase Community Edition, the Canal Adapter uses the RDB module. The destination container can be a MySQL database or an OceanBase Community Edition database. In this example, the destination container is an OceanBase database.

You need to perform the following steps to manually deploy Canal Adapter:

Download the Canal Adapter at https://github.com/alibaba/canal/releases.

wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.adapter-1.1.5.tar.gz

Run the following command to extract files to the specified directory:

mkdir ~/adapter && tar zxvf canal.adapter-1.1.5.tar.gz -C ~/adapter/

Modify the configuration file application.yml of the launcher under the conf directory.

Set mode of the adapter source to tcp. Specify the canal.tcp properties, including the IP address and port of the canal server and the username and password for database connection. Then, specify the connection information of the adapter destination. instance is the source instance name, which is defined when Canal is deployed.

Note If you did not use Canal Admin for deployment, the instance name would be example. If you used Canal Admin for deployment, the instance name would be the one that you specified, for example, mariadb in the preceding example.

key is a custom parameter. The value will be used later. jdbc properties specify the information used to connect to the destination OceanBase database. You can use the MySQL driver.
```
mode: tcp #tcp kafka rocketMQ rabbitMQ
  flatMessage: true
  zookeeperHosts:
  syncBatchSize: 1000
  retries: 0
  timeout:
  accessKey:
  secretKey:
  consumerProperties:
   # canal tcp consumer
    canal.tcp.server.host: 127.0.0.1:11111
    canal.tcp.zookeeper.hosts:
    canal.tcp.batch.size: 500
    canal.tcp.username: tpch
    canal.tcp.password: *********
  canalAdapters:
  - instance: mariadb # canal instance Name or mq topic name
    groups:
    - groupId: g1
      outerAdapters:
    - name: logger
      - name: rdb
        key: obmysql
        properties:
          jdbc.driverClassName: com.mysql.jdbc.Driver
          jdbc.url: jdbc:mysql://127.0.0.1:2883/tpch?useUnicode=true
          jdbc.username: tpch@obmysql#obdemo
          jdbc.password: *********   
```

Modify the RDB mapping file.

You need to modify the mytest_user.yml file in the conf/rdb/ directory.

The mapping modes include table mapping and full-database mapping. The following example shows how to configure full-database mapping.

[root@obce00 adapter]# cat conf/rdb/mytest_user.yml
#dataSourceKey: defaultDS
#destination: example
#groupId: g1
#outerAdapterKey: mysql1
#concurrent: true
#dbMapping:
#  database: mytest
#  table: user
#  targetTable: mytest
#  targetPk:
#    id: id
##  mapAll: true
#  targetColumns:
#    id:
#    name:
#    role_id:
#    c_time:
#    test1:
#  etlCondition: "where c_time>={}"
#  commitBatch: 3000 # The size of batch commit.
# Mirror schema synchronize config
dataSourceKey: defaultDS
destination: mariadb
groupId: g1
outerAdapterKey: obmysql
concurrent: true
dbMapping:
  mirrorDb: true
  database: tpch
  commitBatch: 1000

The destination parameter specifies the name of the Canal instance. The outerAdapterKey parameter is the user-defined key. The mirrorDb parameter specifies to mirror the database-level DDL and DML operations of the source database to the destination database.

The type of imported data depends on the metatype of the destination table. Data types are automatically converted.

Start RDB.

If you use the driver of OceanBase Database for the destination database, place the driver package in the lib folder.

Run the following command to start the canal-adapter launcher:
```
bin/startup.sh
```
Modify the data of the mysql mytest.user table to verify the configuration. The modification will be automatically synchronized to the MYTEST.TB_USER table of the MySQL database. The logs of DML operations will be displayed.
Run the following command to stop RDB:
```
bin/stop.sh
```

Run the following command to view RDB logs:

tail -f logs/adapter/adapter.log
2021-12-09 09:56:04.148 [pool-6-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"s_suppkey":99995,"s_name":null,"s_address":null,"s_nationkey":null,"s_phone":null,"s_acctbal":null,"s_comment":null},"database":"tpch","destination":"mariadb","old":null,"table":"supplier2","type":"INSERT"}
2021-12-09 09:56:04.149 [pool-6-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"s_suppkey":99998,"s_name":null,"s_address":null,"s_nationkey":null,"s_phone":null,"s_acctbal":null,"s_comment":null},"database":"tpch","destination":"mariadb","old":null,"table":"supplier2","type":"INSERT"}
2021-12-09 10:13:35.915 [Thread-3] INFO  c.a.o.canal.client.adapter.rdb.monitor.RdbConfigMonitor - Change a rdb mapping config: mytest_user.yml of canal adapter

Perform synchronization tests

You can perform DML and DDL operations in the source MySQL database named tpch, and check whether the modifications can be synchronized to the destination database. Note the following synchronization restrictions:

The source table must have a primary key. Otherwise, when a record is deleted from the source table, synchronizing the operation to the destination will delete the entire table in the destination.
The CREATE TABLE and CREATE COLUMN DDL operations can be synchronized. However, due to the restrictions on the MySQL tenant in OceanBase Database, a primary key added after a table is created or the change of the column type, such as change between the Numeric, String, and Datetime types, cannot be synchronized.