A data partition is a logical object created based on the table creation statement. It is a mechanism for dividing and managing table data. A tenant is composed of multiple units, and log streams are distributed to the units based on specific rules. This determines the distribution of data partitions associated with log streams among the units. This section describes the distribution rules of data and traffic.
OceanBase Database supports regular tables and partitioned tables. Partitioned tables are further divided into subpartitioned tables and non-subpartitioned partitions. A partitioned table consists of one or more partitions. A regular table is a special case of a partitioned table. OceanBase Database uses range partitioning, list partitioning, hash partitioning, and key partitioning as its basic partitioning strategies.
Unit Group
Starting from OceanBase Database V4.0, the database imposes a restriction on tenant management, requiring that all zones of a tenant have the same number of units. The system numbers the units in each zone. Units of the same zone that have the same number (UNIT_GROUP_ID) belong to the same unit group. Unit groups have the following characteristics:
Each unit group is assigned a unique ID. You can query this ID for a tenant's unit groups by using the
UNIT_GROUP_IDfield in theoceanbase.DBA_OB_UNITSview of the sys tenant.A log stream belongs to only one unit group and is distributed to the units in that unit group. Therefore, all units in a unit group distribute the same data partitions by log stream. This frames a set of data. Additionally, it requires that the service capabilities of all zones be on par.
Starting from OceanBase Database V4.0, you cannot configure the number of units in a tenant on a per-zone basis. Instead, you can adjust the number of units in a tenant only on a per-unit group basis. For example, if you want to scale up resources for a tenant, you must increase the number of units in all zones; if you want to scale down resources for a tenant, you must delete units in all zones. This approach ensures that data is distributed homogeneously across zones.
In the sys tenant, you can query the oceanbase.DBA_OB_UNITS view for all units in the cluster and the unit groups to which they belong. For example:
obclient> select UNIT_ID,TENANT_ID,UNIT_GROUP_ID,ZONE,SVR_IP,SVR_PORT from oceanbase.DBA_OB_UNITS where TENANT_ID = 1004;
+---------+-----------+---------------+--------------+-------------+----------+
| UNIT_ID | TENANT_ID | UNIT_GROUP_ID | ZONE | SVR_IP | SVR_PORT |
+---------+-----------+---------------+--------------+-------------+----------+
| 1004 | 1004 | 1003 | sa128_obv4_1 | xx.xx.xx.47 | 2882 |
| 1005 | 1004 | 1003 | sa128_obv4_2 | xx.xx.xx.81 | 2882 |
| 1006 | 1004 | 1003 | sa128_obv4_3 | xx.xx.xx.19 | 2882 |
+---------+-----------+---------------+--------------+-------------+----------+
3 rows in set
Log stream group
A log stream group is introduced to adapt to scenarios where the primary zone is distributed across multiple zones. If the primary zone is a single zone, you only need to create one log stream in the unit group. If the primary zone consists of multiple zones, you need to create multiple log streams in the unit group to scale out the service capability. These log streams have the same distribution attribute and form a log stream group. The number of log streams in a log stream group is equal to the number of zones in the primary zone.
Therefore, a log stream belongs to one and only one log stream group, which means that the log stream group is in a one-to-one correspondence with the unit group. All log streams in a log stream group are distributed to the units in the corresponding unit group, with the leaders of the log streams distributed across the primary zones.
The number of log streams in a log stream group changes dynamically as the configuration of the primary zone changes. The service life of a log stream group is the same as that of the unit group to which it belongs.
In the sys tenant, you can query the oceanbase.CDB_OB_LS view for all log streams and log stream groups of the tenants in the cluster. For example:
obclient> select TENANT_ID,LS_ID,STATUS,PRIMARY_ZONE,UNIT_GROUP_ID,LS_GROUP_ID from oceanbase.CDB_OB_LS where TENANT_ID=1004;
+-----------+-------+--------+----------------------------------------+---------------+-------------+
| TENANT_ID | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID |
+-----------+-------+--------+----------------------------------------+---------------+-------------+
| 1004 | 1 | NORMAL | sa128_obv4_1;sa128_obv4_2,sa128_obv4_3 | 0 | 0 |
| 1004 | 1001 | NORMAL | sa128_obv4_1;sa128_obv4_2,sa128_obv4_3 | 1003 | 1001 |
+-----------+-------+--------+----------------------------------------+---------------+-------------+
2 rows in set
Summary
In summary, the following points are worth noting about the fine-grained concepts introduced in this topic:
A unit is an abstraction of physical resources. Each unit occupies specific physical resources, such as CPU, memory, and storage space, on a node. It serves as the basic unit for resource scheduling. You can adjust the distribution of units across nodes within the same zone to balance the load and achieve disaster recovery at the node level.
A tenant is composed of multiple units. By setting the unit number and primary zone of a tenant, you define a set of units that will handle business traffic. Each unit is placed on a separate node to facilitate horizontal scaling of the tenant's capacity.
A log stream is a sequence of data that consists of several data partitions and an ordered redo log stream. It uses the Paxos protocol to synchronize multi-replica logs and ensure data consistency, thereby guaranteeing the high availability of data. A log stream is also the commit unit of a transaction. If a transaction modifies data within a single log stream, it can be committed through a one-stage atomic process. If a transaction modifies data across multiple log streams, it is committed through an optimized two-stage atomic process as specified in the OceanBase Database Two-Phase Commit Protocol. In this way, the log stream participates in the distributed transaction. Each log stream has a position attribute and a role attribute. The attributes of the log stream are inherited by all data partitions in the log stream.
The system numbers the units for each zone. Units with the same number form a unit group. A unit group frames the log streams that are distributed only on the units in the unit group.
The number of log stream groups corresponds to the number of primary zones, and the number of log streams in each log stream group is equal to the number of zones in the primary zone. As a result, each zone in the primary zone list can host the leader of a log stream in the log stream group.
For example, if the tenant configuration is
unit_num =2andprimary_zone ='Z1,Z2,Z3', the system defines two unit groups, two log stream groups, and six log streams for the tenant, as shown in the following figure.
OceanBase Database flexibly distributes data and traffic across multiple nodes at multiple levels. You can migrate units between nodes within a zone to balance the load and achieve disaster recovery at the node level.