Basic concepts
Cluster
An OceanBase cluster spans one or more regions. A region consists of one or more zones, and one or more OBServer nodes are deployed in each zone. Each OBServer node can have multiple units. Each unit can have multiple log stream replicas. Each log stream can use multiple tablets.
Region
A region corresponds to a city or region. If a cluster spans multiple regions, the cluster has regional disaster recovery capabilities. If a cluster spans only one region, the cluster does not have regional disaster recovery capabilities in the case of a city-wide failure.
Zone
In OceanBase Database, a zone is an IDC or a physical region. It is a logical concept. Generally, multiple storage nodes are deployed in a zone. These nodes are physically distributed in different IDCs, racks, or servers. A zone can contain multiple IDCs, but an IDC belongs only to one zone.
In OceanBase Database, zones are used to implement data redundancy across IDCs. OceanBase distributes data to different zones based on specific rules to implement data redundancy. When one zone fails, the system automatically switches to the standby zone to ensure data availability.
In addition to data redundancy, OceanBase Database also uses zones as containers for data shards. Data sharding is a technology that divides data into multiple shards and stores the shards on different nodes. This improves the system throughput and performance. In OceanBase Database, different zones serve as the primary zones of different data shards to implement distributed data storage and processing.
OBServer
A physical server that runs the observer process. One physical server can deploy one or more OBServer nodes. In OceanBase Database, a server is uniquely identified by its IP address and service port.
Unit
A container of the resources, such as CPU and memory resources, for a tenant on an OBServer node. A tenant has only one unit on each OBServer node.
Partition
A partition is a user-created logical object that is used to organize and manage table data. You can perform various partition management operations, such as partition creation, deletion, truncation, splitting, merging, and exchange.
Tablet
In OceanBase Database V4.0.0 and later, the concept of tablet is introduced to indicate actual data storage objects. Tablets have the ability to store data. Tablets can be transferred between servers. A tablet is the minimum unit for data balancing. Tablets and partitions are in a one-to-one correspondence. One tablet is created for a single-partition table, and one tablet is created for each partition of a multi-partition table. Each partition of an indexed table, including a locally indexed table and a globally indexed table, also corresponds to one tablet. Partitions of a locally indexed table and the primary table must have the same set of tablets to ensure that the partitions are stored on the same server.
Log stream
A log stream is an entity that is automatically created and managed by OceanBase Database. It is a collection of data that contains multiple tablets and an ordered redo log stream. It uses the Paxos protocol to synchronize logs between replicas to ensure data consistency between replicas and implement high availability of data. Log stream replicas can be migrated and replicated between servers for server management and system disaster recovery.
From the perspective of data storage, a log stream can be abstracted as a tablet container that supports the addition and management of tablet data and the transfer of tablets between log streams for data balancing and horizontal scale-out.
From the perspective of transactions, a log stream is a unit for committing transactions. If a transaction is modified within a single log stream, the transaction can be committed by using the one-phase atomic commit logic. If a transaction is modified across multiple log streams, the transaction can be committed by using the two-phase atomic commit protocol of OceanBase Database. Log streams are participants of distributed transactions.
Deployment modes
To ensure that the majority of replicas of a partition are available when a single server fails, OceanBase Database ensures that the replicas of the same partition are not scheduled on the same server. Since the replicas of the same partition are distributed across different zones or regions, this approach ensures data reliability and database service availability in the case of a city-level disaster or data center failure. This achieves a balance between reliability and availability. OceanBase Database's innovative disaster recovery feature allows for the lossless tolerance of city-level disasters with five IDCs across three regions, and the lossless tolerance of IDC-level failures with three IDCs within the same city.
The lossless disaster recovery feature of OceanBase Database also facilitates cluster maintenance. When an IDC needs to be replaced or repaired, it can be directly taken offline for replacement and repair, and a new IDC can be added. OceanBase Database will automatically replicate and balance log stream replicas, ensuring that the database service is not affected.
RootService
An OceanBase cluster has a RootService, which runs on one of the OBServer nodes. If the OBServer node hosting the RootService fails, a new RootService is elected from the remaining OBServer nodes. The RootService is responsible for resource management, disaster recovery, load balancing, and schema management. Specifically:
Resource management
Involves managing metadata such as regions, zones, OBServer nodes, resource pools, and units. This includes actions like bringing up or down OBServer nodes and changing tenant resource specifications.
Load balancing
Determines the distribution of units across servers.
Disaster recovery
Ensures that the distribution and types of log stream replicas match the specified locality by automatically replicating and migrating replicas.
Schema management
Handles DDL requests and generates new schemas.
Locality
The replica distribution and types of log streams in a tenant across different zones are referred to as the tenant's locality. You can specify the locality when creating a tenant to determine the initial replica types and distributions. You can also modify the tenant's locality later on to make changes.
The following statement creates a tenant named mysql_tenant and sets all replica types of log streams in the mysql_tenant tenant to be full-featured replicas in zones z1, z2, and z3.
obclient> CREATE TENANT mysql_tenant RESOURCE_POOL_LIST =('resource_pool_1'), primary_zone = "z1;z2;z3", locality ="F@z1, F@z2, F@z3" setob_tcp_invited_nodes='%';
The following statement changes the locality of the mysql_tenant tenant so that log streams in this tenant have full-featured replicas in zones z1 and z2, and a log replica in zone z3. OceanBase Database will create, delete, or convert replicas in corresponding zones based on the comparison between the new and old localities of the tenant.
ALTER TENANT mysql_tenant set locality = "F@z1, F@z2, L@z3";
Each log stream has only one replica on the same OBServer node.
For a single log stream, there can be at most one Paxos replica and several non-Paxos replicas in a zone. You can specify non-Paxos replicas in the locality in the following ways:
locality = "F{1}@z1, R{1}@z1": This indicates that zonez1has one full-featured replica and one read-only replica.locality = "F{1}@z1, R{ALL_SERVER}@z1": This indicates that zonez1has one full-featured replica and read-only replicas on other servers in the same zone (which may not exist).
The RootService creates, deletes, migrates, or converts replica types based on the specified locality to ensure that the replica distribution and types match the configured locality.
Primary Zone
Users can configure a tenant-level parameter to distribute the leaders of log streams across the specified zones under the tenant. In this case, the zone where the leader resides is referred to as the Primary Zone.
A Primary Zone is a collection of zones, with different priorities separated by semicolons (;) and the same priorities separated by commas (,). RootService attempts to schedule log stream leaders to higher-priority zones based on the configured Primary Zone, while distributing leaders across different machines within the same priority zone. If no Primary Zone is specified, RootService considers all zones under the tenant to be of the same priority and distributes the log stream leaders across all zones.
Users can configure or modify the Primary Zone for a tenant through a tenant-level parameter.
Here are some examples:
Specifying the Primary Zone at tenant creation time, with priorities
z1=z2>z3.obclient> CREATE TENANT mysql_tenant RESOURCE_POOL_LIST =('resource_pool_1'), primary_zone = "z1,z2;z3", locality ="F@z1, F@z2, F@z3" setob_tcp_invited_nodes='%';Changing the Primary Zone, with priorities
z1>z2>z3.obclient> ALTER TENANT mysql_tenant set primary_zone ="z1;z2;z3";Changing the Primary Zone, with priorities
z1=z2=z3.obclient> ALTER TENANT mysql_tenant set primary_zone =RANDOM;
Note
The Primary Zone is just one factor considered in leader election. Whether a replica in a zone can become a leader also depends on factors such as the replica type and log synchronization progress.