This topic describes the strategies for ensuring the message order for OceanBase Migration Service (OMS) to synchronize incremental data from a database to Kafka.
OMS 3.3.0 or later allows you to set the partition.mode parameter in the config.yaml file to one_partition or hash, which specifies the partitioning strategy for pushing messages to Kafka. In OMS V3.3.0 or later, you can set Partitioning Rules to One or Hash when you create a project to synchronize data to Kafka.
Oneindicates that all data is delivered to queue 0 on the destination Kafka instance. In this case, data delivered to queue 0 on the destination is aggregated by transaction. The synchronization of data of a transaction starts only after the synchronization of data of the previous transaction is completed. The transaction order and change order within a transaction are guaranteed.The advantage of this synchronization strategy is that the message order is fully guaranteed. The disadvantage is that only a single partition on the destination is used. The maximum throughput of the data synchronization project is subject to the throughput of a single Kafka queue.
Hashindicates that the data in the source is modular hashed based on the value of Sharding Columns, or the primary key or non-empty unique key of the table when the sharding column is unspecified, and the data is distributed and concurrently written to multiple queues on the destination Kafka instance.With this partitioning rule, data changes with the same sharding column value in the table are synchronized to the same queue on the destination Kafka instance. This ensures that the data changes are in exact order as they are committed in the source database. Data changes with different sharding column values may be delivered to different queues on the destination, because their hash values may differ. In this case, the data consumption sequence in the downstream may differ from the sequence in which they are committed in the source database. In other words, although a data change to table A occurs before a data change to table B, the change to table B may be written to the destination end before the change to table A.
The advantage of this strategy is that multiple queues on the destination are used, which improves the maximum throughput of the link while ensuring the change order within each data entry.