A data partition is a logical object created based on the table creation statement. It is a mechanism for dividing and managing table data. A tenant is composed of multiple units, and log streams are distributed to the units based on specific rules. This determines the distribution of data partitions associated with log streams across the units. This topic explains the distribution rules of data and traffic.
OceanBase Database supports normal tables and partitioned tables. Partitioned tables are further divided into subpartitioned tables and non-subpartitioned partitions. A partitioned table consists of one or more partitions. A normal table is a partitioned table with only one partition. OceanBase Database uses range partitioning, list partitioning, hash partitioning, and key partitioning as its basic partitioning strategies.
Unit Group
In OceanBase Database V4.0 and later, the database imposes a restriction on tenant management, requiring that all zones of a tenant have the same number of units. The system numbers the units in each zone. Units of the same number (UNIT_GROUP_ID) in different zones belong to the same unit group. Unit groups have the following characteristics:
Each unit group is assigned a unique ID. You can query this ID for a tenant's unit groups in the
UNIT_GROUP_IDfield of theoceanbase.DBA_OB_UNITSview in the sys tenant.A log stream belongs to and is distributed only on one unit group. Therefore, units of the same unit group distribute the same data partitions by log stream. This frames a set of data for the unit group. Additionally, each zone must have the same service capability.
Starting from OceanBase Database V4.0, you cannot personalize the number of units in a zone. You can only adjust the number of units at the unit group level. For example, if you want to scale up resources for a tenant, you can increase the number of units. In this case, all zones must be scaled up uniformly. Similarly, if you want to scale down the resources of a tenant, you can only delete units in unit groups. This ensures homogeneous data distribution across zones.
In the sys tenant, you can query the oceanbase.DBA_OB_UNITS view for all units in the cluster and the unit groups to which they belong. For example:
obclient> select UNIT_ID,TENANT_ID,UNIT_GROUP_ID,ZONE,SVR_IP,SVR_PORT from oceanbase.DBA_OB_UNITS where TENANT_ID = 1004;
+---------+-----------+---------------+--------------+-------------+----------+
| UNIT_ID | TENANT_ID | UNIT_GROUP_ID | ZONE | SVR_IP | SVR_PORT |
+---------+-----------+---------------+--------------+-------------+----------+
| 1004 | 1004 | 1003 | sa128_obv4_1 | xx.xx.xx.47 | 2882 |
| 1005 | 1004 | 1003 | sa128_obv4_2 | xx.xx.xx.81 | 2882 |
| 1006 | 1004 | 1003 | sa128_obv4_3 | xx.xx.xx.19 | 2882 |
+---------+-----------+---------------+--------------+-------------+----------+
3 rows in set
Log stream group
A log stream group is introduced to adapt to scenarios where the primary zone is distributed across multiple zones. If the primary zone is a single zone, you only need to create one log stream in the unit group. If the primary zone consists of multiple zones, you need to create multiple log streams in the unit group to implement horizontal scaling of the service capability. These log streams have the same distribution attribute and form a log stream group. The number of log streams in a log stream group is equal to the number of zones in the primary zone.
Therefore, a log stream belongs to one and only one log stream group, which means that the log stream cannot be changed. A log stream group corresponds to one and only one unit group. All log streams in a log stream group are distributed on the units of the corresponding unit group, with the leaders of the log streams scattered across the primary zones.
The number of log streams in a log stream group changes dynamically as the configuration of the primary zone changes. The life span of a log stream group is the same as that of the unit group to which it belongs.
In the sys tenant, you can query the oceanbase.CDB_OB_LS view for all log streams and log stream groups of tenants in the cluster. For example:
obclient> select TENANT_ID,LS_ID,STATUS,PRIMARY_ZONE,UNIT_GROUP_ID,LS_GROUP_ID from oceanbase.CDB_OB_LS where TENANT_ID=1004;
+-----------+-------+--------+----------------------------------------+---------------+-------------+
| TENANT_ID | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID |
+-----------+-------+--------+----------------------------------------+---------------+-------------+
| 1004 | 1 | NORMAL | sa128_obv4_1;sa128_obv4_2,sa128_obv4_3 | 0 | 0 |
| 1004 | 1001 | NORMAL | sa128_obv4_1;sa128_obv4_2,sa128_obv4_3 | 1003 | 1001 |
+-----------+-------+--------+----------------------------------------+---------------+-------------+
2 rows in set
Summary
In summary, the following points are worth noting about the fine-grained concepts introduced in this topic:
A unit is an abstraction of physical resources. Each unit occupies a specific amount of physical resources, such as CPU, memory, and storage space, on a node. It is the basic unit for resource scheduling. You can adjust the distribution of units across nodes within the same zone to balance the load and achieve disaster recovery at the node level.
A tenant is comprised of multiple units. By specifying the unit number and primary zone of a tenant, you define a set of units that carry business traffic. Each unit is placed on a separate node to facilitate horizontal scaling of the tenant's capacity.
A log stream is a sequence of data that consists of several data partitions and an ordered redo log stream. Data consistency is ensured among replicas through Paxos protocol-based log synchronization. This ensures the high availability of data. In addition, a log stream is the unit for committing transactions. A transaction that modifies data within a single log stream can be committed through one-stage atomic commit. A transaction that modifies data across multiple log streams must be committed through two-stage atomic commit, which is an OceanBase Database optimization of the traditional two-stage commit protocol. As a participant in distributed transactions, a log stream has a position attribute and a role attribute. All data partitions in a log stream inherit its attributes.
The system numbers the units in each zone. Units with the same number form a unit group. A unit group frames a set of log streams that are distributed only on the units in the unit group.
The number of log stream groups is equal to the number of primary zones, and each log stream group contains the same number of log streams as the number of zones in the primary zone. As a result, each zone in the primary zone list can host the leader of one log stream in the log stream group.
For example, if the tenant configuration is
unit_num =2andprimary_zone ='Z1,Z2,Z3', the tenant consists of two units, two log stream groups, and six log streams. The following figure provides an illustration.
OceanBase Database flexibly distributes data and traffic across multiple nodes at multiple levels. You can migrate units from one node to another within the same zone to balance the load and achieve disaster recovery at the node level.