A data partition is a logical object created based on a table creation statement. It is a mechanism for dividing and managing table data. Each tenant contains several units, and log streams are distributed on the units based on specific rules. This determines the distribution of data partitions belonging to the log streams on the units. This topic describes the distribution rules for data and its traffic.
OceanBase Database supports regular tables, partitioned tables, and subpartitioned tables. A partitioned table contains one or more partitions. A regular table contains only one partition, and can be considered as a special partitioned table. The basic partitioning strategies supported by OceanBase Database are RANGE, LIST, HASH, and KEY.
Unit group
OceanBase Database V4.0 and later versions require that all zones of the same tenant have the same number of resource units. The system assigns an ID to each unit in each zone. Units with the same ID across different zones form a unit group. Unit groups have the following characteristics:
Each unit group has a unique ID. You can query the unit group ID from the
UNIT_GROUP_IDcolumn in theoceanbase.DBA_OB_UNITSview in thesystenant.One log stream belongs only to one unit group, and is distributed only on units in the unit group. Therefore, the same data partitions are distributed on all units in a unit group based on log streams, thus defining a group of data. In this case, all zones must have equivalent service capabilities.
OceanBase Database V4.0 and later allow you to scale in or out resources only by unit group, but does not support configuration of the number of units for a tenant by zone. For example, if you want to scale out resources for a tenant, you can only increase the number of units for all zones in a unified manner. Correspondingly, if you want to scale in resources for a tenant, you can delete units only by unit group. The unit group mechanism ensures homogeneous data distribution in different zones.
You can query all units and corresponding unit groups from the oceanbase.DBA_OB_UNITS view. Here is an example:
obclient> select UNIT_ID,TENANT_ID,UNIT_GROUP_ID,ZONE,SVR_IP,SVR_PORT from DBA_OB_UNITS where TENANT_ID = 1004;
+---------+-----------+---------------+--------------+-------------+----------+
| UNIT_ID | TENANT_ID | UNIT_GROUP_ID | ZONE | SVR_IP | SVR_PORT |
+---------+-----------+---------------+--------------+-------------+----------+
| 1004 | 1004 | 1003 | sa128_obv4_1 | xx.xx.xx.47 | 2882 |
| 1005 | 1004 | 1003 | sa128_obv4_2 | xx.xx.xx.81 | 2882 |
| 1006 | 1004 | 1003 | sa128_obv4_3 | xx.xx.xx.19 | 2882 |
+---------+-----------+---------------+--------------+-------------+----------+
3 rows in set
Log stream group
The concept of log stream group is introduced to support the distribution of the primary zones across zones. When the primary zone setting contains a single zone, you need to create only one log stream in a unit group. When the primary zone setting contains multiple zones, you must create multiple log streams in a unit group for the distribution of service capabilities. The log streams are distributed in the same way and form a log stream group. The number of log streams in the log stream group is equal to the number of zones contained in the primary zone setting.
Therefore, one log stream belongs to one log stream group, which cannot be modified. One log stream group corresponds to one unit group. All log streams in a log stream group are distributed in the corresponding unit group. Leaders of log streams are distributed in the primary zones.
The number of log streams in a log stream group dynamically changes with the primary zone setting of the tenant. The life cycle of a log stream group is bound to that of the corresponding unit group.
You can query the log streams of all tenants in a cluster and corresponding log stream groups from the oceanbase.CDB_OB_LS view. Here is an example:
obclient> select TENANT_ID,LS_ID,STATUS,PRIMARY_ZONE,UNIT_GROUP_ID,LS_GROUP_ID from CDB_OB_LS where TENANT_ID=1004;
+-----------+-------+--------+----------------------------------------+---------------+-------------+
| TENANT_ID | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID |
+-----------+-------+--------+----------------------------------------+---------------+-------------+
| 1004 | 1 | NORMAL | sa128_obv4_1;sa128_obv4_2,sa128_obv4_3 | 0 | 0 |
| 1004 | 1001 | NORMAL | sa128_obv4_1;sa128_obv4_2,sa128_obv4_3 | 1003 | 1001 |
+-----------+-------+--------+----------------------------------------+---------------+-------------+
2 rows in set
Summary
This section summarizes the concepts involved in this topic.
Units are abstracted from physical resources. Each unit occupies specific physical resources on an OBServer node, including CPU, memory, and storage resources. Resources are scheduled by unit. You can adjust the distribution of units across OBServer nodes in a zone to achieve load balancing and disaster recovery across OBServer nodes.
Each tenant contains several units. You can specify the number of units and the primary zone setting for a tenant to define a series of unit sets for distributing business traffic. Each unit is deployed on an OBServer node to facilitate horizontal scaling of the tenant.
A log stream defines a group of data, including several data partitions and ordered redo log streams. It uses the Paxos protocol to synchronize logs between replicas to ensure data consistency between the replicas and thereby implement high availability of data. Transactions are committed by log stream. If the modification in a transaction is completed within a single log stream, the transaction can be committed by using the one-phase atomic commit logic. If the modification in the transaction is completed across multiple log streams, the transaction can be committed by using the two-phase atomic commit protocol of OceanBase Database. Log streams are participants of distributed transactions. A log stream has a location attribute and a role attribute. All data partitions in the log stream inherit its attributes.
The system assigns an ID to each unit in each zone. Units with the same ID form a unit group. A unit group corresponds to a set of log streams. The log streams are distributed only on units in the unit group.
One log stream group corresponds to one unit group. The number of log streams in a log stream group is determined by the number of zones contained in the primary zone setting. Therefore, each zone contained in the primary zone setting accommodates the leader of a log stream in the log stream group.
For example, if you set the value of the
unit_numparameter to2and the value of theprimary_zoneparameter to'Z1,Z2,Z3', two unit groups, two log stream groups, and six log streams are defined for the tenant. The following figure shows an example.
In OceanBase Database, data and traffic are flexibly distributed on multiple OBServer nodes in multiple dimensions. You can migrate units between OBServer nodes in a zone to achieve load balancing and disaster recovery across OBServer nodes.