To meet the disaster recovery requirements of different customers and business scenarios, OceanBase Database provides multiple high availability (HA) deployment solutions:
Multi-replica deployment based on Paxos
The multi-replica deployment solution is implemented based on the Paxos protocol. Generally, multiple replicas, for example, three or five replicas, constitute a cluster and provide disaster recovery capabilities.
If a minority of replicas, for example, one out of three or two out of five, are unavailable, services can be automatically recovered without data loss. The multi-replica deployment solution provides a recovery time objective (RTO) within 8s and a recovery point objective (RPO) of 0.
Physical standby database solution based on asynchronous log replication
The physical standby database solution is similar to the primary/standby replication solution of traditional databases. Redo logs are replicated among tenants in two or more clusters to form a tenant-level primary/standby relationship. This provides two types of disaster recovery capabilities: planned lossless switchover and lossy switchover upon failures.
This solution is mainly designed for disaster recovery purposes in dual-IDC or dual-city scenarios. The primary tenant provides read and write capabilities, while the standby tenants provide read-only and disaster recovery capabilities. When a planned lossless switchover is performed, the roles of the primary and standby tenants are switched without data loss (RPO = 0) within seconds (RTO within seconds).
If the cluster of the primary tenant fails, you can perform a lossy switchover to make a standby tenant the new primary tenant. In this case, data loss may occur (RPO > 0), and the switchover occurs within seconds (RTO within seconds).
HA solution based on an arbitration service
The HA solution based on an arbitration service is available in OceanBase Database V4.1.0 and later. This solution provides sound disaster recovery capability with a smaller number of replicas by using an independent arbitration service.
The following example shows the deployment architecture of two full-featured replicas and one arbitration service. When one full-featured replica fails, the cluster performs automatic degradation for disaster recovery by using the arbitration service, at an RPO of 0 and an RTO within seconds. After the services on the failed OBServer node are restored, the cluster automatically upgrades the services to the capacity before the failure. In this process, the arbitration service only participates in synchronizing and persisting a small amount of metadata, consuming only a small amount of CPU, memory, and network resources.
You can use the preceding HA solutions in combination. The following table lists the deployment modes that we recommend for OceanBase Database. You can flexibly choose a desired deployment mode based on the IDC configuration and your performance and availability requirements.
| Deployment solution | Disaster recovery capabilities | RTO | RPO |
|---|---|---|---|
| Three replicas in one IDC | Lossless disaster recovery at the server or rack level (when a minority of replicas fail) | Within 8s | 0 |
| Physical standby databases for two IDCs in the same city | Lossy disaster recovery at the IDC level (when the primary IDC fails) | Within seconds | > 0 |
| Three replicas in three IDCs in the same city | Lossless disaster recovery at the IDC level (when a minority of replicas fail) | Within 8s | 0 |
| Physical backup databases for two IDCs in two regions | Lossy disaster recovery (when a region fails) | Within seconds | > 0 |
| Physical standby databases for three IDCs in two regions | Lossless disaster recovery (when an IDC fails) / Lossy disaster recovery (when a region fails) | Within seconds | When an IDC fails, the RPO is 0. When a region fails, the RPO is greater than 0. |
| Five replicas in three IDCs across three regions | Lossless disaster recovery (when a region fails) | Within 8s | 0 |
| Five replicas in five IDCs across three regions | Lossless disaster recovery (when a region fails) | Within 8s | 0 |
Three replicas in one IDC
If you use only one IDC, you can deploy three or more replicas to achieve lossless disaster recovery at the OBServer node level. If an OBServer node or a minority of OBServer nodes are down, services remain unaffected without data loss. If the IDC contains multiple racks, you can deploy a zone for each rack to achieve lossless disaster recovery at the rack level.
Physical standby databases for two IDCs in the same city
To achieve disaster recovery at the IDC level with two IDCs in the same city, you can use a physical standby database and deploy a cluster in each IDC. If one of the IDCs is unavailable, another IDC can take over the services. If the standby IDC is unavailable, business data is not affected and services can continue. If the primary IDC is unavailable, the standby database takes over business services. In the second case, data loss may occur because data on the primary database may not be fully synchronized to the standby database.
Three replicas in three IDCs in the same city
If three IDCs are available in the same city, you can deploy a zone in each IDC to achieve lossless disaster recovery at the IDC level. If one of the IDCs is unavailable, the other two IDCs can continue to provide services without data loss. This deployment architecture does not rely on physical backup databases, but cannot provide disaster recovery capabilities at the region level.
Physical backup databases for two IDCs in two regions
If you want to achieve disaster recovery at the region level and have only one IDC in each region, you can use the physical standby database architecture. You can select one city as the primary region, deploy the primary database in the primary region, and deploy the standby database in the other region (standby region). When the standby region is unavailable, business services in the primary region are not affected. When the primary region is unavailable, the standby database becomes the new primary database to continue providing services. In this case, business data loss may occur.
Furthermore, you can use two IDCs in two regions to implement the dual-active architecture and deploy two sets of physical backup databases. In this case, the two regions are in primary/standby mode. This allows you to manage resources in an efficient manner and achieve higher disaster recovery performance.
Physical standby databases for three IDCs in two regions
If you have three IDCs in two regions, you can use physical backup databases for three IDCs in two regions to provide disaster recovery at the region level.
The region with two IDCs serves as the primary region. Each IDC in the primary region is deployed with one or two full-featured replicas. The primary region provides the database read and write services. The arbitration service and physical standby database are deployed in the IDC in the standby region to provide disaster recovery services.
If a fault occurs in an IDC in the primary region, the arbitration service performs automatic downgrade to restore services within seconds without data loss. If both IDCs in the primary region fail at the same time, the physical standby database takes over the services of the primary database. In this case, the RPO is greater than 0 with data loss.
Five replicas in three IDCs across three regions
To support lossless disaster recovery at the region level, you need to use at least three regions based on the Paxos protocol. This updated solution includes three cities with one IDC each. The IDC in the first two cities has two replicas each, whereas the third IDC has only one replica. Compared with the "three IDCs across two regions" solution, this solution requires each transaction to be synchronized to at least two cities. Business applications must be tolerant of the latency introduced by cross-region replication.
Five replicas in five IDCs across three regions
This deployment mode is similar to the "five replicas in three IDCs across three regions" solution. The difference is that with five replicas in three IDCs across three regions, each replica is deployed in a different IDC to further enhance the disaster recovery capabilities at the IDC level.