The database system plays a crucial role in data storage and query within the application architecture, ensuring enterprise data security and service continuity. High availability is the primary consideration in database system architecture design, including high service availability and high data reliability. This topic describes OceanBase Database's high service availability solutions, while high data reliability solutions will be discussed in the "Backup and restore" topic.
The high service availability solutions include the following:
- Intra-cluster multi-replica disaster recovery
- Arbitration-based disaster recovery
- Inter-cluster physical standby database disaster recovery
Multi-replica disaster recovery
OceanBase Database implements multi-replica disaster recovery according to the Paxos protocol, providing high availability capabilities with a recovery point objective (RPO) of 0 and a recovery time objective (RTO) of less than 8 seconds in the event of a minority failure.
The multi-replica disaster recovery solution is designed for a single cluster. It persists transaction logs and synchronizes log data among multiple replicas. The Paxos protocol ensures that the log data is successfully persisted in a majority of replicas, while disaster recovery is achieved through membership changes.
Multi-replica disaster recovery is applicable to scenarios where a minority of OBServer nodes in a cluster fail. It provides fast disaster recovery and zero data loss. However, if the "Five IDCs across Three Regions" solution is not feasible, but you still require cross-regional disaster recovery and high availability (such as in cases of majority node failures or software bugs), the physical standby database solution can be used.
Arbitration-based disaster recovery
Arbitration-based disaster recovery is an innovative high availability solution provided by OceanBase Database, based on the Paxos multi-replica disaster recovery solution.
At the business level, arbitration-based disaster recovery ensures strong data synchronization on a majority of replicas (4 full-featured replicas + 1 arbitration service) or all replicas (2 full-featured replicas + 1 arbitration service). If half of the full-featured replicas fail, automatic fault degradation occurs to ensure data integrity while maintaining service continuity.
Arbitration-based disaster recovery solves the problem of trade-offs between service continuity and data integrity in traditional database solutions that prioritize either maximum protection or maximum availability. It avoids the risk of split-brain. Additionally, based on the user's locality configuration, OceanBase Database can provide read and write services on multiple OBServer nodes where full-featured replicas are deployed.
Similar to the majority-based disaster recovery solution, the arbitration-based disaster recovery solution is an intra-cluster solution and cannot address data protection and availability issues if a majority or all of the replicas fail.
Physical standby database disaster recovery
The physical standby database disaster recovery solution is an integral part of OceanBase Database's high availability solutions.
This solution is designed for multiple clusters, enabling the transmission of transaction logs among multiple clusters and providing log-based physical hot backup services. In the OceanBase Database V4.2.0 version, the physical standby database solution adopts an independent primary/standby architecture, where the primary/standby relationship exists at the tenant level (in other words, primary and standby tenants can be created). The primary and standby tenants communicate directly through the network or through a transmission channel established using third-party log services, transmitting only the logs. Unlike the centralized architecture in earlier versions, the independent primary/standby architecture allows for more flexible cluster management.
Since logs are asynchronously transmitted from the primary tenant to standby tenants, only the maximum performance mode is supported, and the maximum protection and maximum availability modes are not supported. For scenarios that require strong data consistency during disaster recovery, the multi-replica disaster recovery solution or the arbitration-based disaster recovery solution is recommended.