Based on the concept of table partitioning in traditional database services, OceanBase Database can divide the data of a table into different partitions. In addition, as a distributed database service, OceanBase Database copies the data in the same partition to multiple OBServer nodes to ensure high availability (HA) of data reads and writes. These data copies on different OBServer nodes are called replicas. The Paxos consensus protocol is used to ensure strong consistency among different replicas of the same partition. Each partition and its replicas comprise an independent Paxos group. One partition is the leader and the other partitions are followers. The leader supports strong-consistency reads and writes, and the followers support weak-consistency reads.
Location cache
OceanBase Database organizes user data by partition. Each partition has multiple replicas for disaster recovery. During the execution of an SQL request, the partition location information is required. Such information is used to route the SQL request to the corresponding OBServer node and access data in the corresponding replica. Each OBServer has a location cache service for refreshing and caching the required partition location information.
OceanBase Database persists the partition locations to built-in tables, which are called meta tables. The location information of different types of tables is organized and persisted in different Meta tables in a hierarchical manner. This enables automatic leader election. The following list describes the information in different Meta tables::
__all_virtual_core_root_table: records the location of the__all_root_tabletable.__all_root_table: records the locations of all built-in tables in the cluster.__all_virtual_meta_table: records the locations of partitions of all user tables under all tenants in the cluster.
Replica type
Several replica types are available based on the types of data stored. This is to support the different business preferences in terms of data security, performance scalability, availability, and cost. Currently, OceanBase Database supports the following types of replicas:
Full-featured replica
Log replica
Encrypted voting replica
Read-only replica
For more information about replica management, see Overview of replicas.
Distributed consensus protocol
OceanBase Database synchronizes transaction logs among replicas of the same partition based on the Paxos protocol. It commits transaction logs only when the logs are synchronized in the majority of replicas. The leader ensures strong consistency reads and writes by default. Followers support weak consistency reads, which allows you to read data of an earlier version.
Data balancing
OceanBase Database uses the RootService to manage load balancing among the resource units of a tenant. Different types of replicas require different amounts of resources. RootService considers the CPU utilization, disk usage, memory usage, and input/output operations per second (IOPS) of each resource unit during partition management. To make full use of resources available on each OBServer node, RootService balances the usage of various resources among all OBServer nodes after load balancing.
Replica balancing
RootService adjusts the distribution of tenants and replicas on each resource unit based on replica migration.
Leader balancing
RootService balances the number of leaders on OBServers based on the replica balancing mechanism and different factors, such as the primary zone of the current tenant. The leaders of different partitions can be aggregated on the same OBServer to reduce distributed transactions and the response time to service requests. The leaders of different partitions can be distributed on multiple OBServers to maximize resource usage and improve the system throughput.