What is CDC?
CDC stands for change data capture, which is a process that tracks data changes.
Why is CDC needed?
CDC helps you identify data that has been changed since the last extraction. You can do many things with data provided by CDC. For example, you can use the data to make historical databases or perform near real-time caching. You can also provide the CDC data to message queues (MQs), so that you can use MQs for analysis and auditing.
CDC implementation logic in OceanBase Database
Oblogproxy is a proxy service that obtains incremental logs from OceanBase Database. Designed based on liboblog, oblogproxy provides links for applications to access and manage real-time incremental data logs of OceanBase Database. Oblogproxy enables you to subscribe to incremental logs in isolated networks and supports multiple access methods.
Canal is an open-source component that parses incremental logs in MySQL databases and thereby provides incremental data subscription and consumption. It can be used for data synchronization.
Data link:
ob_cluster -> oblogreader -> oblogmsg -> canal_server -> canal_client -> mysql
Component overview:
Liboblog is a fundamental component of OceanBase CDC. Oblogproxy is derived from liboblog and liboblog depends on oblogmsg.
Liboblog is a C++ dynamic library. It delivers incremental logs that are pulled from an OceanBase cluster in the transaction commitment sequence. The incremental logs are delivered to consumers by creating related objects based on the LogMessage protocol.
ObLogReader is liboblog encapsulated in C++ and provides multiple serialized outputs for the created LogMessage objects based on liboblog.
As an incremental log proxy, oblogproxy does not forward incremental data, but only creates the corresponding oblogreader sub-process for a new connection request. Then, the oblogreader sub-process directly sends the incremental log entries serialized by LogMessage to the client by using this connection. Oblogproxy is used to manage the lifecycle of all ObLogReader sub-processes on the local machine.
Logclient deserializes the byte stream by using the LogMessage protocol and then generates LogMessage objects for consumers to call, for example, for interconnection with Canal.
Install oblogproxy
Download the oblogproxy RPM package.
Download link: https://github.com/oceanbase/oblogproxy/releases/tag/v1.0.0
Install oblogproxy.
yum install oblogproxy-1.0.0-1.el7.x86_64.rpmConfirm the oblogproxy dependency and ensure that liboblog is available.
[root@xxx.xxx.xxx.xxx ~]$cd /usr/local/oblogproxy/ [root@xxx.xxx.xxx.xxx oblogproxy]$ldd ./bin/logproxy linux-vdso.so.1 => (0x00007fffe68fe000) liboblog.so.1 => /lib/liboblog.so.1 (0x00002ae0113f1000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ae046bf8000) libdl.so.2 => /lib64/libdl.so.2 (0x00002ae046e14000) librt.so.1 => /lib64/librt.so.1 (0x00002ae047018000) libm.so.6 => /lib64/libm.so.6 (0x00002ae047220000) libc.so.6 => /lib64/libc.so.6 (0x00002ae047522000) /lib64/ld-linux-x86-64.so.2 (0x00002ae0111cd000) libaio.so.1 => /lib64/libaio.so.1 (0x00002ae0478f0000)
Start oblogproxy
Modify oblogproxy parameters
Generate an encrypted user and the corresponding password.
[root@xxx.xxx.xxx.xxx bin]$pwd /usr/local/oblogproxy/bin [root@xxx.xxx.xxx.xxx bin]$./logproxy -x root@sys EA87898018FD1EDDC2AA11CE1556E917 [root@xxx.xxx.xxx.xxx bin]$./logproxy -x Root@2021 ****************************************Modify parameters in the conf.json file.
[root@xxx.xxx.xxx.xxx conf]$pwd /usr/local/oblogproxy/conf [root@xxx.xxx.xxx.xxx conf]$ls -l Total usage: 4 -rw-r--r-- 1 root root 1081 Oct 25 18:00 conf.json # Modify content in the conf.json file "ob_sys_username": "EA87898018FD1EDDC2AA11CE1556E917", "ob_sys_password": "****************************************",Start oblogproxy.
[root@xxx.xxx.xxx.xxx oblogproxy]$./run.sh start work path : /usr/local/oblogproxy is_running : (8252)/usr/local/oblogproxy logproxy is running ! logproxy started!Start and stop oblogproxy and view its status.
Start oblogproxy.
[root@xxx.xxx.xxx.xxx oblogproxy]$pwd /usr/local/oblogproxy [root@xxx.xxx.xxx.xxx oblogproxy]$./run.sh startStop oblogproxy.
[root@xxx.xxx.xxx.xxx oblogproxy]$pwd /usr/local/oblogproxy [root@xxx.xxx.xxx.xxx oblogproxy]$./run.sh stopCheck the oblogproxy status.
[root@xxx.xxx.xxx.xxx oblogproxy]$pwd /usr/local/oblogproxy [root@xxx.xxx.xxx.xxx oblogproxy]$./run.sh status
Note:
The process is confirmed after oblogproxy is started.
[root@xxx.xxx.xxx.xxx ~]$ps -ef | grep logproxy | grep -v grep
root 26379 26373 0 10:45 pts/1 00:00:00 ./bin/logproxy -f ./conf/conf.json
[root@xxx.xxx.xxx.xxx ~]$
When a client is successfully connected, a sub-process is forked.
[root@xxx.xxx.xxx.xxx ~]$ps -ef | grep oblogreader | grep -v grep
root 26386 26379 2 10:45 pts/1 00:00:17 oblogreader -f ./conf/conf.json
Canal server
Download canal-for-ob. For more information about canal-for-ob, see the GitHub repository of canal-for-ob.
Modify the canal.properties file under /opt/canal_ob/canal.deployer-for-ob-rc1/conf.
canal.zkServers = xxx.xxx.xxx.1:12181,xxx.xxx.xxx.2:12181,xxx.xxx.xxx.3:12181
canal.serverMode = tcp
canal.destinations = obtest2
canal.instance.global.spring.xml = classpath:spring/ob-default-instance.xml
Canal instance
[root@xxx.xxx.xxx.xxx conf]$ll
Total usage: 28
-rwxr-xr-x 1 root root 319 Oct 21 16:15 canal_local.properties
-rwxr-xr-x 1 root root 6577 Oct 26 16:21 canal.properties
-rwxr-xr-x 1 root root 3592 Oct 21 16:15 logback.xml
drwxrwxrwx 2 root root 4096 Oct 21 16:15 metrics
drwxr-xr-x 2 root root 4096 Dec 7 13:29 obtest2 -- Replace example with obtest2
drwxrwxrwx 3 root root 4096 Nov 2 10:51 spring
[root@xxx.xxx.xxx.xxx obtest2]$pwd
/opt/canal_ob/canal.deployer-for-ob-rc1/conf/obtest2
[root@xxx.xxx.xxx.xxx obtest2]$ll
total 16
-rwxrwxrwx 1 root root 1147 Jan 13 17:15 ca.crt
-rwxrwxrwx 1 root root 1241 Jan 13 17:15 client.crt
-rwxrwxrwx 1 root root 1708 Jan 13 17:15 client.key
-rwxr-xr-x 1 root root 1743 Feb 13 17:15 instance.properties
-rwxr-xr-x 1 root root 1743 Feb 13 17:15 ob-instance.properties
# Delete instance.properties, and rename ob-instance.properties as instance.properties.
[root@xxx.xxx.xxx.xxx obtest2]$cat instance.properties
# Parameters of the OceanBase cluster
canal.instance.oceanbase.rsList=xxx.xxx.xxx.1:2882:2881;xxx.xxx.xxx.2:2882:2881
canal.instance.oceanbase.username=root@test_tenant_1#obcluster
canal.instance.oceanbase.password=********
canal.instance.oceanbase.startTimestamp=0
# Parameters of oblogproxy
canal.instance.oceanbase.logproxy.address=127.0.0.1:2983
canal.instance.oceanbase.logproxy.sslEnabled=false
canal.instance.oceanbase.logproxy.serverCert=../conf/${canal.instance.destination:}/ca.crt
canal.instance.oceanbase.logproxy.clientCert=../conf/${canal.instance.destination:}/client.crt
canal.instance.oceanbase.logproxy.clientKey=../conf/${canal.instance.destination:}/client.key
# Specify whether to remove the tenant ID prefixed to the database name. The default database name is in the [tenant].[db] format in the log file exported by oblogproxy.
canal.instance.parser.excludeTenantInDbName=true
canal.instance.oceanbase.tenant=test_tenant_1
# Filter the logs. Specify the logs in the [tenant].[database].[table] format. Regular expressions are supported.
canal.instance.filter.regex=test_tenant_1.db1.*
Start the Canal server.
[root@xxx.xxx.xxx.xxx ~]$cd /opt/canal_ob
[root@xxx.xxx.xxx.xxx canal_ob]$cd canal.deployer-for-ob-rc1
[root@xxx.xxx.xxx.xxx canal.deployer-for-ob-rc1]$cd bin
[root@xxx.xxx.xxx.xxx bin]$ls -lrt
Total usage: 20
-rwxr-xr-x 1 root root 1244 Oct 21 16:15 startup.bat
-rwxr-xr-x 1 root root 1356 Oct 21 16:15 stop.sh
-rwxr-xr-x 1 root root 3167 Oct 21 16:15 startup.sh
-rwxr-xr-x 1 root root 226 Oct 21 16:15 restart.sh
-rw-r--r-- 1 root root 6 Dec 7 10:42 canal.pid
[root@xxx.xxx.xxx.xxx bin]$./startup.sh
Canal client
Download canal-for-ob. For more information, see the GitHub repository of canal-for-ob.
[root@xxx.xxx.xxx.xxx conf]$pwd
/opt/canal_ob/canal_adapter_for_ob/conf
[root@xxx.xxx.xxx.xxx conf]$ls -lrt
Total usage: 44
-rwxr-xr-x 1 root root 2172 Oct 21 16:15 logback.xml
drwxrwxrwx 2 root root 4096 Oct 21 16:15 tablestore
drwxr-xr-x 2 root root 4096 Oct 21 16:15 hbase
drwxr-xr-x 2 root root 4096 Oct 21 16:15 kudu
drwxrwxrwx 2 root root 4096 Oct 21 16:15 META-INF
drwxr-xr-x 2 root root 4096 Oct 21 16:15 es6
-rwxr-xr-x 1 root root 170 Oct 21 16:15 bootstrap.yml
-rwxr-xr-x 1 root root 552 Oct 21 16:15 mytest_user.yml
drwxr-xr-x 2 root root 4096 Oct 21 16:15 es7
-rwxr-xr-x 1 root root 3240 Oct 26 16:26 application.yml
drwxrwxrwx 2 root root 4096 Dec 7 10:27 rdb
[root@xxx.xxx.xxx.xxx conf]$cat application.yml
canalAdapters:
- instance: obtest2# canal instance Name or mq topic name
groups:
- groupId: g1
outerAdapters:
- name: logger
- name: rdb
key: mysql1
properties:
jdbc.driverClassName: com.mysql.jdbc.Driver
jdbc.url: jdbc:mysql://xxx.xxx.xxx.xxx:3367/mysql?useUnicode=true&useSSL=false
jdbc.username: root
jdbc.password: ********
Modify database/table mappings (database mappings as an example here) for the adapter.
[root@xxx.xxx.xxx.xxx conf]$pwd
/opt/canal_ob/canal_adapter_for_ob/conf
[root@xxx.xxx.xxx.xxx conf]$ls -l rdb
Total usage: 4
-rwxr-xr-x 1 root root 186 Dec 6 21:11 mytest_user.yml
## Mirror schema synchronize config
dataSourceKey: defaultDS
destination: obtest2
groupId: g1
outerAdapterKey: mysql1
concurrent: true
dbMapping:
mirrorDb: true
database: db1
Start the Canal client.
[root@xxx.xxx.xxx.xxx bin]$pwd
/opt/canal_ob/canal_adapter_for_ob/bin
[root@xxx.xxx.xxx.xxx bin]$ls -lrt
Total usage: 16
-rwxr-xr-x 1 root root 2289 Oct 21 16:15 startup.sh
-rwxr-xr-x 1 root root 793 Oct 21 16:15 startup.bat
-rwxr-xr-x 1 root root 205 Oct 21 16:15 restart.sh
-rwxr-xr-x 1 root root 1370 Oct 21 16:15 stop.sh
[root@xxx.xxx.xxx.xxx bin]$./startup.sh
After you write data to the source OceanBase database, you can view the synchronized data on the destination MySQL database.