Create a project to synchronize data from an OceanBase database to a Kafka instance|V3.4.0|OceanBase Migration Service| docs|Distributed Database

Create a project to synchronize data from an OceanBase database to a Kafka instance

Last Updated：2023-07-05 01:56:06 Updated

Kafka is a widely used high-performance distributed stream computing platform. OceanBase Migration Service (OMS) supports real-time data synchronization between a self-managed Kafka instance and Oracle and MySQL tenants of OceanBase Database, extending the message processing capability. Therefore, OMS is widely applied to business scenarios such as real-time data warehouse building, data query, and report distribution.

OMS enables you to synchronize data to message queue products, extending the all-around application of your business in big data fields, such as monitoring data aggregation, streaming data processing, and online/offline analysis. For more information about the data formats for the two types of tenants, see Data formats.

Limits

Only physical tables can be synchronized.
The Kafka version must be 0.9, 1.0, or 2.x.

Notice

If the Kafka version is 0.9, schema synchronization is not supported.
During data synchronization, if you rename a source table to be synchronized and the new name is beyond the synchronization scope, the data of the source table will not be synchronized to the destination Kafka instance.
During data synchronization, OMS allows you to delete a table and then create a new one. In other words, you can first perform the drop table a operation and then the create table a_tmp operation without affecting the project. OMS does not allow you to create a table by renaming an existing table. After you perform the rename table a to a_tmp operation, the project cannot be carried out.
When data transfer is resumed on a link, some data (within the last minute) may be duplicated in the DataHub instance, and deduplication is required in downstream applications.
The name of a table to be synchronized, as well as the names of columns in the table, must not contain Chinese characters.

Procedure

Create a data synchronization project.
1. Log on to the OMS console.
2. In the left-side navigation pane, click Data Synchronization.
3. On the Data Synchronization page, click Create Synchronization Project in the upper-right corner.

On the Select Source and Destination page, configure the parameters.

Parameter	Description
Synchronization Project Name	We recommend that you set it to a combination of Chinese characters, digits, and letters. It must not contain any spaces and cannot exceed 64 characters in length.
Label	Click the field and select the target tag from the drop-down list. You can click Manage Tags to create, modify, and delete tags. For more information, see Manage data synchronization projects by using tags.
Source	If you have created a physical OceanBase data source, select it from the drop-down list. If you have not created a data source, click Create Data Source in the drop-down list and create a data source in the dialog box that appears on the right. For more information, see Create a physical OceanBase data source.
Destination	If you have created a Kafka data source, select it from the drop-down list. If you have not created a data source, click Create Data Source in the drop-down list and create a data source in the dialog box that appears on the right. For more information, see Create a Kafka data source.

Click Next. On the Select Synchronization Type page, specify Synchronization Type and Configuration for the current data synchronization project.

Options for Synchronization Type and Configuration are Schema Synchronization, Full Synchronization, and Incremental Synchronization. Full Synchronization supports the synchronization of tables without primary keys. Incremental Synchronization supports only DML Synchronization. The DML operations for synchronization are Insert, Delete, and Update.
(Optional) Click Next.

If the source database is an OceanBase database, you must configure the obconfig_url parameter, username, and password for incremental synchronization.

If you have selected Incremental Synchronization without configuring the required parameters for the source database, the More About Data Sources dialog box appears to prompt you to configure the parameters. For more information, see Create a physical OceanBase data source.

After you configure the parameters, click Test Connectivity. After the test succeeds, click Save.

Click Next. On the Select Synchronization Objects page, select a synchronization scope.

When you synchronize data from an OceanBase database to a Kafka instance, you can synchronize data from multiple tables to multiple topics.

In the left-side pane, select the objects to be synchronized.
Click >.

In the Map Object to Topic dialog box, select a mapping method.

If you do not select Schema Synchronization when you set the synchronization type and configuration, you can select only Existing Topics here. If you have selected Schema Synchronization when you set the synchronization type and configuration, you can select only one mapping method to create or select a topic.

For example, if you have selected Schema Synchronization, when you use both the Create Topic and Select Topic mapping methods or rename the topic, a precheck error will be returned due to option conflicts.

Parameter	Description
Create Topic	Enter the name of the new topic in the text box. The topic name contains 3 to 64 characters, and can contain only letters, digits, hyphens (-), underscores (_), and periods (.).
Select Topic	OMS allows you to query Kafka topics. You can click Select Topic, and then find and select the topics to be synchronized from the Select Topic drop-down list. You can also enter the name of an existing topic and select it after it appears.
Batch Generate Topics	The format for generating topics in batches is: `Topic_${Database Name}_${Table Name}`.

If you select Create Topic or Batch Generate Topics, you can query the newly created topics in the Kafka instance after schema synchronization is completed. By default, each Kafka topic has three partitions and each partition has one replica, which cannot be modified.

Click OK.

When you synchronize data from an OceanBase database to a Kafka instance, OMS allows you to import objects from text and perform the following operations on the objects in the destination database: change topics, set row filtering conditions, and remove a single object or all objects. Objects in the destination database are listed in the structure of Topic > Database > Table.

Operation	Steps
Import Objects	In the list on the right, click Import Objects in the upper-right corner. In the dialog box that appears, click OK. Notice This operation will overwrite previous selections. Proceed with caution. In the Import Synchronization Objects dialog box, import the objects to be synchronized. You can import CSV files to rename databases/tables and set row filtering conditions. For more information, see Download and import the settings of synchronization objects. Click Validate. After the validation succeeds, click OK.
Change Topic	In the list on the right, move the pointer over the object that you want to change. Click Change Topic. In the Map Object to Topic dialog box, change the topics to be synchronized. Click OK. Notice The selected topics and tables will be merged into the selected topic. Proceed with caution.
Settings	You can execute the `WHERE` clause in OMS to implement row filters and select sharding columns and columns to synchronize. In the list on the right, move the pointer over the object that you want to change. Click Settings. In the Settings dialog box, you can perform the following operations: In the Row Filters section, specify a standard SQL `WHERE` clause to filter data by row. The setting takes effect for both full synchronization and incremental synchronization. Only the data meeting the `WHERE` condition is synchronized to the destination data source, thereby filtering data by row. If the statement contains a reserved SQL keyword, add an escape character (`). Select the sharding columns that you want to use from the Sharding Columns drop-down list. You can select multiple fields as sharding columns. This parameter is optional. Unless otherwise specified, select the primary keys as sharding columns. If the primary keys are not load-balanced, select fields with unique identifiers and whose load is balanced as sharding columns. Ensure that the selected sharding columns are correct. An incorrect sharding column will cause data synchronization to fail. Sharding columns can be used for the following purposes: Load balancing: Threads used for sending messages can be recognized based on the sharding columns if the destination table supports concurrent writes. Orderliness: OMS ensures that messages are received in order if the values of the sharding columns are the same. The orderliness specifies the sequence of executing DML statements for a column. In the Select Columns section, select the columns to be synchronized. If you select all or no columns, OMS will synchronize all columns. Click OK.
Remove/Remove All	OMS allows you to remove one or more objects from the destination database during data mapping. Remove a single synchronization object In the list on the right of the selection section, hover the pointer over the target object, and click Remove. The synchronization object is removed. Remove all synchronization objects In the list on the right of the selection section, click Remove All in the upper-right corner. In the dialog box that appears, click OK to remove all synchronization objects.

Click Next. On the Synchronization Options page, specify the following parameters.

Parameter	Description
Incremental Synchronization Start Timestamp	If you have selected Full Synchronization when you set the synchronization type and configuration, the value here is the project start time by default and cannot be modified. If you do not select Full Synchronization when you set the synchronization type and configuration, specify a point in time after which the data is to be synchronized. The default value is the current system time. You can select a point in time or enter a timestamp. Notice You can select the current time or a point in time earlier than the current time. This parameter is closely related to the retention period of archived logs. Generally, you can start data synchronization from the current timestamp.
Serialization Method	The message format for synchronizing data to a Kafka instance. Valid values: Default, Canal, Dataworks (version 2.0 supported), SharePlex, and DefaultExtendColumnType. For more information, see Data formats.
Enable Intra-Transaction Sequence	Specifies whether to maintain order within a transaction. If this feature is enabled, OMS marks the sequence number for a transaction to be sent to a downstream node. Notice This parameter is valid only for the SharePlex format and is intended for you to obtain the sequence numbers of the DML statements that form a transaction. For example, if a transaction contains 10 DML statements numbered from 1 to 10, OMS will deliver these statements to the destination database in the same order. If this option is enabled, the system performance may be affected. Choose whether to enable it based on the business characteristics.
Partitioning Rules	The rule for synchronizing data from an OceanBase database to a Kafka topic. Valid values: Hash and One. Hash indicates that OMS uses a hash algorithm to select the partition of a Kafka topic based on the value of the primary key or sharding column. One indicates that JSON messages are delivered to a partition under a topic to ensure ordering. If this option is enabled, the system performance may be affected. Choose whether to enable it based on the business characteristics.
Business System Identification (Optional)	Identifies the source business system of data. The business system identifier consists of 1 to 20 characters.

Click Precheck.

During the precheck, OMS detects the connection with the destination Kafka instance. If an error is returned:
- You can troubleshoot the error and run the precheck again.
- You can also click Skip in the Actions column of the precheck item that returns the error. Then, a dialog box appears, indicating the impact that may be caused if you choose to skip this check item. If you want to continue, click OK in the dialog box.
Click Start Project. If you do not need to start the project now, click Save to go to the details page of the data synchronization project. You can start the project later as needed.

OMS allows you to modify synchronization objects when a data synchronization project is running. For more information, see View and modify synchronization objects. After a data synchronization project is started, the synchronization objects will be executed based on the selected synchronization type. For more information, see the "View synchronization details" section in the View details of a data synchronization project topic.

If data access fails due to a network failure or the slow startup of processes, go to the project list or the project details page and click Restore.

Enterprise Edition

Community Edition

Create a project to synchronize data from an OceanBase database to a Kafka instance

Limits

Procedure