OceanBase Database uses an optimized version of Paxos, namely, Multi-Paxos, to implement multi-replica data synchronization and cluster high availability (HA).
Working principle of Paxos
For more information about the working principle of Paxos, see the content in Paxos Made Simple.
Relationship between the Paxos and election protocols of OceanBase Database
The Paxos and election protocols of OceanBase Database form the log synchronization protocol (log service). The two protocols are correlated to some extent but are loosely coupled in implementation.
The election protocol selects a leader among multiple replicas and ensures the validity of the leader by using the Lease mechanism. The log service advances the Multi-Paxos state machine based on the leader, restores unconfirmed logs, and starts to provide services after the restore.
Avoid split-brain by using Paxos
What is split-brain?
In the context of primary/standby synchronous disaster recovery solution of traditional database services, if the system performs as expected, the primary node (also called the master) provides data read and write services, and other nodes serve as standby nodes (also called slaves) to synchronize data as logs from the primary node for disaster recovery.
You may encounter one of the following situations when a system exception occurs, depending on whether network partitioning is involved:
The primary node maintains a valid connection to the application server in the same IDC and continues to serve as the primary node to provide data read and write services.
A network failure is detected on the primary node. In this case, the engineer responsible for making disaster recovery decisions or the automated management tool attempts to escalate a standby node as the new primary node to provide data read and write services.
As a result, two or more primary nodes exist, and multiple writes exist at the same time, seriously reducing data correctness. This phenomenon is called split-brain.
How to avoid split-brain?
OceanBase Database implements the HA election and log synchronization protocols based on Paxos to ensure data security and service continuity.
In Paxos, a decision is made only after being agreed on by the majority of nodes.
The HA election protocol of OceanBase Database makes sure that a node can become a primary node to provide read and write services only after being recognized by the majority of nodes. In a node set, every two majorities are intersected. This avoids the existence of two primary nodes at the same time.
After a node is elected as the primary node, the service continuity is ensured based on the Lease mechanism.
When a minority of standby nodes are down, the services of the primary node are not affected.
In the event of a primary node failure or network partitioning, the majority of standby nodes wait for the lease to expire. When the lease expires, the original primary node no longer provides read and write services. OceanBase Database automatically elects a new primary node among the remaining nodes to provide services.
Based on the log synchronization protocol of OceanBase Database, the persistence of the data to be written must be complete on the majority of nodes. In the typical deployment mode of three IDCs in the same city, a transaction is committed only after the persistence log of this transaction is synchronized to at least two IDCs.
When a minority of standby nodes are down, the services of the primary node remain unaffected without data loss.
In the event of a primary node failure or network partitioning, the remaining nodes still store complete data. The HA election protocol elects a new primary node. The new primary node executes the restore process to restore complete data from the remaining nodes, and starts to provide services. The entire process is automated.
OceanBase Database avoids split-brain and ensures data security and service continuity by using the HA election and log synchronization protocols implemented based on Paxos.