Overview

With the advancements of digital transformation, industrial data is experiencing an exponential increase in volume and access concurrency. Enterprises from industries such as finance, insurance, and communications expect more powerful disaster recovery capabilities to protect their core businesses against IDC-level or city-wide failures. They need solutions to achieve high availability and disaster recovery of services at the server, IDC, and city levels and quickly complete a failover in the event of a disaster to minimize the blast radius of failures without data loss. The solutions must also support the management and operation of various application systems, including core business systems. Through years of real-world experience in supporting the core systems of Alipay and MYbank, OceanBase has developed a database architecture that features active geo-redundancy with five IDCs deployed across three regions. The architecture provides benchmark cases demonstrating the industry-leading capabilities of OceanBase Database in supporting core systems that have extremely high data consistency and availability requirements.

Challenges

High Risk of Data Loss

Conventional IT system ensures system availability by using the primary/standby deployment mode. In this mode, the system is deployed in the primary and standby IDCs. This deployment mode is well-established and widely used but cannot ensure zero data loss during a failover in the case of a system failure.

Low Time Efficiency in Disaster Recovery

In a conventional active-active architecture, a failover can take a long time and is impractical when the primary IDC is down due to the low time efficiency of the asynchronous data replication mechanism between the primary and standby IDCs. The high cost of making decision on failover execution prolongs business interruptions.

Large Blast Radius of Failures

In a conventional architecture, business systems are closely coupled. As a result, a single point of failure (SPOF) in the database, such as the failure of an IDC, a region, a network connection, or a cluster node, often causes inconvenience to all system users and seriously affects business continuity.

Architecture

OceanBase Database implements geo-disaster recovery based on the multi-replica and multi-region architecture and the highly efficient Paxos consensus protocol. In this architecture, OceanBase clusters can be deployed in five IDCs across three regions, with data replicas stored in local and remote IDCs. Based on this architecture, OceanBase Database supports the Logical Data Center (LDC) deployment mode. In addition, applications are transformed into microservices based on middleware, such as the Scalable Open Financial Architecture (SOFA) platform from Ant Group. Therefore, the blast radius of a failed business LDC can be reduced to 1%, and services can be automatically restored within 1 minute in the event of a city-wide fault, with zero data loss. With the new arbitration service which provides voting capabilities for Paxos with low resource requirements, OceanBase Database V4.x supports IDC-level disaster recovery in the architecture of three IDCs across two regions and greatly reduces the overall cost of resources (such as cross-city network bandwidth and hardware) of the third IDC. When any IDC fails, the database response time remains unchanged, meeting the needs of enterprises.

Benefits

Data Protected Across Five IDCs Across Three Regions

OceanBase Database adopts the unprecedented city-level lossless disaster recovery solution, which features five IDCs deployed across three regions. This solution meets the Level 6 disaster recovery requirements of the financial industry in China. Services can be automatically restored from IDC-level/city-level failures within 1 minute, with zero data loss.

LDC-based Architecture Design

Each LDC contains a complete set of data to independently provide data services to applications. Users can use an upper-layer scheduling system to flexibly schedule and switch application data access traffic between LDCs of different cities and IDCs. In this architecture, the failure of a single LDC does not affect the global services. During peak hours, users can redirect any proportion of the business traffic of heavily loaded LDCs to lightly loaded IDCs to implement cross-cloud scaling of database capacity.

Massive Throughput

OceanBase Database is designed to process an enormous amount of data up to petabyte scale. Its native distributed architecture allows concurrent data reads and writes and significantly improves batch processing performance. In each batch, OceanBase Database supports the query of more than 1,000,000 data records per second the concurrent processing of up to tens of billions of accounts that involve petabytes of data.