This topic describes the strategies for ensuring the message order for OceanBase Migration Service (OMS) to synchronize incremental data from a database to Kafka.
Message synchronization strategies
OMS 3.3.0 or later allows you to set the partition.mode parameter in the config.yaml file to one_partition or hash, which specifies the partitioning strategy for pushing messages to Kafka. In OMS V3.3.0 or later, you can set Partitioning Rules to One or Hash when you create a task to synchronize data to Kafka.
Oneindicates that all data is delivered to queue 0 on the target Kafka instance. In this case, data delivered to queue 0 on the target is aggregated by transaction. The synchronization of data of a transaction starts only after the synchronization of data of the previous transaction is completed. The transaction order and change order within a transaction are guaranteed.The advantage of this synchronization strategy is that the message order is fully guaranteed. The disadvantage is that only a single partition on the target is used. The maximum throughput of the data synchronization task is subject to the throughput of a single Kafka queue.
Hashindicates that the data in the source is modular hashed based on the value of Sharding Columns, or the primary key or non-empty unique key of the table when the sharding column is unspecified, and the data is distributed and concurrently written to multiple queues on the target Kafka instance.With this partitioning rule, data changes with the same sharding column value in the table are synchronized to the same queue on the target Kafka instance. This ensures that the data changes are in exact order as they are committed in the source database. Data changes with different sharding column values may be delivered to different queues on the target, because their hash values may differ. In this case, the data consumption sequence in the downstream may differ from the sequence in which they are committed in the source database. In other words, although a data change to table A occurs before a data change to table B, the change to table B may be written to the target before the change to table A.
The advantage of this strategy is that multiple queues on the target are used, which improves the maximum throughput of the link while ensuring the change order within each data entry.
Limitations on the hash mode
When you synchronize data from OceanBase Database Community Edition to a Kafka instance, if you use the hash partitioning method, OMS Community Edition uses the primary keys (PKs) or unique keys (UKs) in the source table as the partitioning keys. If a table has a PK, the PK is used. Otherwise, the UK with the fewest fields is used. Regardless of whether a PK or UK is used, the same UK may be distributed to multiple partitions of the Kafka topic in the following scenarios:
The table ID is a primary key.
The business UK is not an ID field.
If you delete a row and then insert it again based on the same UK value in the source table, OMS Community Edition uses the ID field as the basis for hash partitioning. This may result in the same UK value being distributed to different partitions. If the downstream system does not handle this situation properly, it may process the data inserted later before the deleted data, which may cause issues.