This topic describes how to check the synchronization status of commit logs (clogs) in a cluster.
Scenarios
This inspection item checks whether an OceanBase cluster contains partitions whose clogs are not synchronized.
Prerequisites
You use OceanBase Database of a version earlier than V4.0.
Technical mechanism
Clogs are a core part of the log service of OceanBase Database. As an essential component of OceanBase Database, the log service supports many core features, such as the atomicity, durability, and isolation of transactions, and the high availability (HA) of databases. During transaction commit, OceanBase Database uses the Paxos protocol to ensure that the clogs of the majority of replicas are stored in the disk. Eventually, clogs of the minority of nodes are persisted in the disk. OceanBase Database provides an automatic log recycling mechanism to ensure that the log disk is always writable. As one of the essential modules of OceanBase Database, clogs play an irreplaceable role in the system architecture:
- Guarantee the atomicity and durability of a transaction by persisting the content in the MemStore and the transaction status information when the transaction is committed.
- Guarantee the isolation of transactions by generating the transaction version (
trans_version) and using messages to synchronize the version to all followers. - Use the Paxos protocol to synchronously transfer logs to the majority of replicas. This implements data disaster recovery and HA for a distributed database, and thereby supports various types of replicas, such as read-only replicas and log replicas.
- Maintain authoritative replica member group and leader information, which is used by various modules of OceanBase Database. Provide an underlying mechanism for complex strategies of RootService, such as load balancing and rotating major compaction.
- Provide external data services to offer incremental data to external tools such as Data Replication Center (DRC) and incremental backup.
In OceanBase Database, the leader replica and follower replicas of a partition form a Paxos group. When a transaction is committed, clogs are synchronized from the leader to followers. Followers are periodically checked. If the latest confirmed clogs of a follower are not updated for a period of time, the follower proactively fetches clogs from the leader for clog synchronization in batches. As clogs are reusable, if the clog synchronization between the nodes is greatly delayed, the followers will rebuild the baseline by pulling data from the SSTable of the leader.
Procedure
After you initiate a basic inspection on a cluster object, if the clogs are out of synchronization, the inspection report displays the related information in detail, such as the corresponding tenant, table name, IP address of the OBServer node, and partition index. Clogs can be out of synchronization due to the following causes:
Network latency between nodes. You can check the network latency by running the
pingcommand.The clog sliding window of a follower is full. In this case, the follower stops receiving clogs from the leader until any clog slides out of the window and prints the
check_can_receive_log, now can not receive logmessage in the observer.log file. When the sliding window of a follower is full for a long period of time and no clog slides out of the window, two cases may occur. If clog commit of the majority of replicas is affected, transaction commit will be affected. If clog commit of the minority of replicas is affected, clogs on the follower will always fall behind of those on the leader.