OceanBase Migration Service (OMS) uses the Store component for Change Data Capture (CDC). Common CDC modes include trigger-based CDC, timestamp-based CDC, and log-based CDC. The Store component of OMS supports log-based CDC, which has slighter intrusive impact on the database and exhibits higher real-time performance compared with other modes.
The Store component pulls database logs, parses the logs, and stores the parsed logs in a unified format. Downstream applications consume incremental data of different databases from the Store component in a consistent manner.
The Store component consists of a reader and a local storage. The reader is a subprocess that pulls and parses database logs. During running, the Store component loads a reader based on the type of the database to pull data from. This topic describes the basic principles of readers for different databases.
Oracle Reader
Main operations of Oracle Reader are log pulling, pre-parsing and filtering, transaction management, log parsing, and database access.
Log pulling
Oracle Reader pulls incremental change logs from the
V$LOGMNR_CONTENTSview of LogMiner. It can concurrently pull changes from multiple archive files to improve the pulling efficiency.Pre-parsing and filtering
Oracle Reader parses out key information, such as the table to which a change belongs, and then filters out records of tables that do not need to be synchronized based on the allowlist and blocklist.
Transaction management
OMS does not use the
DBMS_LOGMNR.COMMITTED_DATA_ONLYoption of LogMiner because this option may cause insufficient memory when large transactions are involved and OMS can process large transactions itself. Therefore, the pulled change logs may include both rolled-back transactions and transactions rolled back to specified savepoints. During transaction management, Oracle Reader aggregates data of the same transaction, returns committed transactions, and discards rolled-back transactions. It can also process the transactions rolled back to specified savepoints, and temporarily store large transactions to the disk to reduce memory overheads.Log parsing
Oracle Reader parses column values based on column types.
Database access
LogMiner does not necessarily return values of the LOB or LONG data type. Therefore, Oracle Reader must access the database to obtain column values in some change cases.
MySQL Reader
Main operations of MySQL Reader are log pulling, pre-parsing and filtering of binlog events, and parsing of binlog event values.
Log pulling
MySQL Reader serves as a standby database of a MySQL data source to receive incremental binlogs from the source. It establishes a binlog dump connection to the source to receive a binlog stream from the source and splits the binlog stream into complete and independent binlog events based on the event length.
Pre-parsing and filtering of binlog events
MySQL Reader pre-parses binlog events and filters out binlog events of tables that do not need to be synchronized based on the allowlist and blocklist.
Parsing of binlog event values
MySQL Reader parses column values based on column types.
OceanBase Database Reader
OceanBase Database Reader requests clogs, also known as redo logs, of each partition from OceanBase Database by using remote procedure calls (RPCs). It converts clogs to intermediate data based on the table and column schemas that are obtained by querying OBServer nodes. After OceanBase Database Reader aggregates data changes of each partition involved in a transaction, it provides the changed data of the transaction to an external process row by row. Main operations of OceanBase Database Reader are log pulling, log parsing, and transaction sequencing.
Log pulling
OceanBase Database Reader pulls logs in distributed mode, assembles participant transactions, and generates asynchronous tasks to flush transaction logs to the disk.
Log parsing
OceanBase Database Reader parses data in parallel mode and maintains row data.
Transaction sequencing
OceanBase Database Reader assembles and sequences distributed transactions and discards rolled-back transactions, including those rolled back to specified savepoints. Large transactions are temporarily stored to disk to reduce memory overheads.
DB2 Reader
Main operations of DB2 Reader are log pulling, log splitting and assembly, log header parsing, log filtering, transaction management, log value parsing, and database access.
Log pulling
DB2 Reader reads logs by using the db2ReadLog API provided by the DB2 database, and queries the log manager for the current log status.
Log splitting and assembly
DB2 Reader assembles multiple logs into one log record or splits one log into multiple log records in different scenarios.
Log header parsing
DB2 Reader parses log header information, such as the timestamp, transaction ID, and log type.
Log filtering
DB2 Reader filters out records of tables that do not need to be synchronized based on the allowlist and blocklist.
Transaction management
DB2 Reader aggregates data of the same transaction. After a transaction is completed, it outputs all changes of the transaction to the next phase and filters out rolled-back transactions, including those rolled back to specified savepoints. Large transactions are temporarily stored to the disk to reduce memory overheads.
Log value parsing
DB2 Reader parses detailed information of each log record in a transaction based on the rule corresponding to the log type.
Database access
In some scenarios, the db2ReadLog API does not return complete values of the LOB or LONG data type. Therefore, DB2 Reader must access the source database to obtain the complete values.
PostgreSQL Reader
Main operations of PostgreSQL Reader are log pulling and log parsing and filtering.
Log pulling
PostgreSQL Reader subscribes to incremental changes of tables to be synchronized or changes of all tables by using the
CREATE PUBLICATIONstatement provided by the PostgreSQL database. It uses the built-in pgoutput plugin of the PostgreSQL database as the output plugin to reduce intrusion into the PostgreSQL database.Log parsing and filtering
PostgreSQL Reader filters log records based on the allowlist and blocklist, and parses column values in the tables to be synchronized based on the column types. OMS supports tables with primary keys whose
REPLICA IDENTITYvalues are set toDEFAULT. In this scenario, pgoutput does not return the value of the beforeimage in theUPDATEoperation. Therefore, PostgreSQL Reader must add the value of the beforeimage during parsing and specify whether the value is trusted.