Intra-tenant balancing|V4.4.2|OceanBase Database| docs|Distributed Database

Intra-tenant balancing

Last Updated：2026-06-10 08:33:57 Updated

In OceanBase Database V3.x, replicas are distributed at the partition level across multiple nodes. The load balancing module can achieve intra-tenant partition balancing by switching the primary partitions and migrating replica partitions. After the upgrade to the single-machine log stream architecture in V4.x, replicas are distributed at the log stream level. Each log stream carries a large number of partitions. Therefore, in V4.2.x, the load balancing module cannot directly control partition distribution. Instead, it must first balance the number of log streams (LS) and their leaders before performing partition balancing based on the log stream distribution.

Notice

Partition balancing applies only to user tables.

The priority order for log stream balancing and partition balancing is as follows:

Log stream balancing > Partition balancing

LS balancing (homogeneous zone mode)

LS number balancing

When you modify the UNIT_NUM parameter, change the priority of zones in PRIMARY_ZONE, or modify the locality (which affects PRIMARY_ZONE) for a tenant, the background thread of the load balancing module will immediately change the LS quantity and location through LS splitting, merging, regrouping, or other actions to achieve the desired tenant status. That is, after the balancing is complete, each tenant unit has an LS group, the number of LSs in an LS group is equal to the number of top priority zones in PRIMARY_ZONE, and leaders in an LS group are evenly distributed to primary zones of the top priority.

Note

An LS group is an attribute of an LS. LSs with the same LS group ID are gathered together.
In the homogeneous zone mode of OceanBase Database V4.x, all zones of a tenant must have the same number of units. To facilitate unit management across zones, the system introduces the unit group mechanism. Units with the same group ID (UNIT_GROUP_ID) in different zones belong to the same unit group. All units in the same unit group have exactly the same data distribution and the same LS replicas, and serve the same partition data. Unit groups are used for data grouping in a resource container. Each group of data is served by one or more LSs. The read/write service capabilities can be extended to multiple units in a unit group.
In the homogeneous zone mode, one LS group corresponds to one unit group. That is, all LSs with the same LS group ID are distributed in the same unit group.

In the homogeneous zone mode, the number of LSs is calculated as follows:

LS number = UNIT_NUM * first_level_primary_zone_num

Note

The LS number mentioned here refers to the number of user log streams, excluding system log streams and broadcast log streams.

The following table describes scenarios of LS number balancing.

Triggering scenario	Balancing condition	Balancing algorithm	LS balancing strategy
Top priority zones in `PRIMARY_ZONE` are changed from `Z1,Z2` to `Z1` and `UNIT_NUM` is changed from `1` to `2`	Some LS groups are short of LSs and some have redundant LSs, but the total number of LSs meets final status requirements.	Migrate redundant LSs to LS groups with insufficient LSs.	LS_BALANCE_BY_MIGRATE
Top priority zones in `PRIMARY_ZONE` are changed from `Z1` to `Z1,Z2` or `UNIT_NUM` is changed from `2` to `3`	Only some LS groups are short of LSs.	Assume that the current LS count is M and the LS count after scale-out is N (M < N). Each missing LS must receive `M/N` tablets.	LS_BALANCE_BY_EXPAND
Top priority zones in `PRIMARY_ZONE` are changed from `Z1,Z2` to `Z1` or `UNIT_NUM` is changed from `3` to `2`	Only some LS groups have redundant LSs.	Assume that the LS quantity is changed from M to N (M > N) after the scale-in. Each of the N remaining LSs is assigned with `(M-N)/N` of the tablets originally served by the reduced LSs.	LS_BALANCE_BY_SHRINK

Example 1: As shown in the figure below, top priority zones in PRIMARY_ZONE are changed from Z1,Z2 to Z1 and UNIT_NUM is changed from 1 to 2. The total LS quantity does not change. LS2 is migrated directly to the new unit, and the leader of LS2 is switched to Z1 according to the PRIMARY_ZONE change, completing LS balancing.

Example 2: As shown in the figure below, PRIMARY_ZONE = Z1 is configured, and UNIT_NUM is changed from 1 to 2. To ensure that every unit has an LS, LS2 is split out, carries 1/2 of the tablets, and is migrated to the new unit.

Example 3: As shown in the following figure, UNIT_NUM = 1, and PRIMARY_ZONE changes from RANDOM to Z1,Z2. To ensure that the leader is only located in the zones with the first priority of PRIMARY_ZONE, the system also needs to ensure that the number of tablets on each log stream is balanced. Since LS3 is in the same LS group as the other LSs, the system can transfer 1/2 of the tablets from LS3 to LS1, and then merge LS3 into LS2 so that LS2 can serve the remaining 1/2 of the tablets. This way, the number of LSs is reduced while the number of tablets is balanced.

LS leader balancing

LS leader balancing evenly distributes the LS leaders across the zones with the first priority of the PRIMARY_ZONE based on the LS number balancing. For example, after you change the PRIMARY_ZONE of a tenant, the background thread of the load balancing module dynamically adjusts the number of LSs based on the first priority of the PRIMARY_ZONE, and then switches the leader of each LS to ensure that one LS leader is located in each zone with the first priority of the PRIMARY_ZONE.

The automatic LS leader balancing is controlled by the tenant-level parameter enable_ls_leader_balance, which is enabled by default.

Example 1: As shown in the following figure, UNIT_NUM = 1, and PRIMARY_ZONE changes from Z1 to Z1,Z2. To ensure that the leader is located in each zone with the first priority of PRIMARY_ZONE, the system splits a new LS in the same unit, and then switches the leader of the new LS to Z2 through LS leader balancing.

Example 2: As shown in the following figure, when UNIT_NUM = 1 and PRIMARY_ZONE changes from RANDOM to Z1,Z2, the system merges LS3 into other LSs through LS number balancing. The remaining LS leaders are evenly distributed across Z1 and Z2 without the need for additional LS leader balancing.

LS balancing (heterogeneous zone mode)

In the current version, OceanBase Database also supports the heterogeneous zone mode for tenants. In this mode, the number of units in each zone of a tenant can be different, but a tenant can have at most two types of UNIT_NUM.

In the heterogeneous zone mode, the load balancing algorithm controls the distribution of log streams at the unit level:

Within each zone of a tenant, the number of log streams per unit is equal.
In zones with the same UNIT_NUM, log streams are aggregated in a homogeneous manner.

The formula for calculating the number of LSs in the heterogeneous zone mode is as follows:

LS number = (least common multiple of all UNIT_NUMs of the tenant) * first_level_primary_zone_num

Partition distribution during table creation

When a user creates a user table, OceanBase Database adopts a balanced distribution strategy to scatter or aggregate partitions across different user log streams, ensuring a relatively balanced distribution of partitions across these streams.

User tables in a table group

You can specify a table group to flexibly configure partition aggregation and scattering rules for different tables. If you specify a table group with the corresponding sharding attribute and scope attribute (available in V4.4.2 BP1 and later) when you create a user table, the partitions of the new table are distributed to LSs of the corresponding user based on the alignment rules of the table group. The following table describes the rules.

Versions before V4.4.2 BP1 (excluding)

Sharding attribute of a table group	Description	Partitioning method requirement	Alignment rule
NONE	All partitions of all tables are aggregated to the same server.	Unlimited	All partitions are aggregated to the same LS as the partitions of the table with the smallest `table_id` in the table group.
PARTITION	Scattering is performed at the partition granularity. All subpartitions under the same partition in a subpartitioned table are aggregated.	All tables use the same partitioning method. For a subpartitioned table, only its partitioning method is verified. Therefore, a table group with this sharding attribute may contain both partitioned and subpartitioned tables, as long as their partitioning methods are the same.	Partitions with the same partition value are aggregated, including partitions of partitioned tables and all subpartitions under corresponding partitions of subpartitioned tables.
ADAPTIVE	Scattering is performed in an adaptive way. If the tables in a table group with this sharding attribute are partitioned tables, scattering is performed at the partition granularity. If the tables are subpartitioned tables, scattering is performed at the subpartition granularity.	All tables are partitioned tables or subpartitioned tables. If all tables are partitioned tables, the tables must use the same partitioning method. If all tables are subpartitioned tables, the tables must use the same partitioning and subpartitioning methods.	For partitioned tables, partitions with the same partition value are aggregated. For subpartitioned tables, subpartitions with the same partition value and subpartition value are aggregated.

V4.4.2 BP1 and later versions

SHARDING/SCOPE attribute value	SERVER	ZONE	CLUSTER
NONE	All partitions in the table group are aggregated on the same node. (The partition distribution method is equivalent to `SHARDING = NONE` in versions before the `SCOPE` attribute was introduced.)	All partitions in the table group are evenly scattered within a zone.	All partitions in the table group are evenly scattered across all nodes.
PARTITION	Not supported.	Not supported.	Partitions are clustered at partition granularity, and each partition group is randomly scattered. (The partition distribution method is equivalent to `SHARDING = PARTITION` in versions before the `SCOPE` attribute was introduced.)
ADAPTIVE	Not supported.	Not supported.	For partitioned tables: Partitions are clustered at partition granularity, and each partition group is scattered across the entire cluster. For subpartitioned tables: Under each partition, subpartitions are clustered at subpartition granularity, and each partition group is scattered across the entire cluster. The partition distribution method is equivalent to `SHARDING = ADAPTIVE` in versions before the `SCOPE` attribute was introduced.

SHARDING/SCOPE attribute value

SERVER

ZONE

CLUSTER

NONE All partitions in the table group are aggregated on the same node. (The partition distribution method is equivalent to SHARDING = NONE in versions before the SCOPE attribute was introduced.) All partitions in the table group are evenly scattered within a zone. All partitions in the table group are evenly scattered across all nodes.

PARTITION Not supported. Not supported. Partitions are clustered at partition granularity, and each partition group is randomly scattered. (The partition distribution method is equivalent to SHARDING = PARTITION in versions before the SCOPE attribute was introduced.)

ADAPTIVE

Not supported.

For partitioned tables: Partitions are clustered at partition granularity, and each partition group is scattered across the entire cluster.
For subpartitioned tables: Under each partition, subpartitions are clustered at subpartition granularity, and each partition group is scattered across the entire cluster.

The partition distribution method is equivalent to SHARDING = ADAPTIVE in versions before the SCOPE attribute was introduced.

Notice

If the table group already contains user tables when you create a new table, the new table's partitions are aligned to those of the user table with the smallest table_id in the table group.

For more information about table groups, see Table groups.

User tables

When you create a user table without specifying a table group, OceanBase Database automatically distributes the new partitions to all user LSs. The specific rules are as follows:

For non-partitioned tables: The new partition is allocated to the user LS with the fewest partitions.
For partitioned tables, partitions are evenly distributed to all user log streams based on the round-robin algorithm.
For subpartitioned tables: The subpartitions are evenly distributed to all user LSs using the Round Robin algorithm.

The following examples illustrate these rules.

Example 1: In MySQL mode, create four non-partitioned tables named tt1, tt2, tt3, and tt4.

obclient [test]> CREATE TABLE tt1(c1 int);

obclient [test]> CREATE TABLE tt2(c1 int);

obclient [test]> CREATE TABLE tt3(c1 int);

obclient [test]> CREATE TABLE tt4(c1 int);

Query the partition distribution of these four tables.

obclient [test]> SELECT table_name,partition_name,subpartition_name,ls_id,zone FROM oceanbase.DBA_OB_TABLE_LOCATIONS WHERE table_name in('tt1','tt2','tt3','tt4') AND role='LEADER';

The query result is as follows.

+------------+----------------+-------------------+-------+------+
| table_name | partition_name | subpartition_name | ls_id | zone |
+------------+----------------+-------------------+-------+------+
| tt1        | NULL           | NULL              |  1001 | z1   |
| tt2        | NULL           | NULL              |  1002 | z2   |
| tt3        | NULL           | NULL              |  1003 | z3   |
| tt4        | NULL           | NULL              |  1001 | z1   |
+------------+----------------+-------------------+-------+------+
4 rows in set

Example 2: Create a partitioned table named tt5.

obclient [test]> CREATE TABLE tt5(c1 int) PARTITION BY HASH(c1) PARTITIONS 6;

Query the partition distribution of this table.

obclient [test]> SELECT table_name,partition_name,subpartition_name,ls_id,zone FROM oceanbase.DBA_OB_TABLE_LOCATIONS WHERE table_name ='tt5' AND role='LEADER';

The query result is as follows. The partitions of the table are evenly distributed to 1001, 1002, and 1003 based on the round-robin algorithm.

+------------+----------------+-------------------+-------+------+
| table_name | partition_name | subpartition_name | ls_id | zone |
+------------+----------------+-------------------+-------+------+
| tt5        | p0             | NULL              |  1003 | z3   |
| tt5        | p1             | NULL              |  1001 | z1   |
| tt5        | p2             | NULL              |  1002 | z2   |
| tt5        | p3             | NULL              |  1003 | z3   |
| tt5        | p4             | NULL              |  1001 | z1   |
| tt5        | p5             | NULL              |  1002 | z2   |
+------------+----------------+-------------------+-------+------+
6 rows in set

Example 3: Create a subpartitioned table named tt8.

obclient [test]> CREATE TABLE tt8 (c1 int, c2 int, PRIMARY KEY(c1, c2)) 
    PARTITION BY HASH(c1) SUBPARTITION BY RANGE(c2) SUBPARTITION TEMPLATE 
    (SUBPARTITION p0 VALUES LESS THAN (1990), 
     SUBPARTITION p1 VALUES LESS THAN (2000),
     SUBPARTITION p2 VALUES LESS THAN (3000),
     SUBPARTITION p3 VALUES LESS THAN (4000),
     SUBPARTITION p4 VALUES LESS THAN (5000),
     SUBPARTITION p5 VALUES LESS THAN (MAXVALUE)) 
     PARTITIONS 2;

Query the partition distribution of this table.

obclient [test]> SELECT table_name,partition_name,subpartition_name,ls_id,zone FROM oceanbase.DBA_OB_TABLE_LOCATIONS WHERE table_name ='tt8' AND role='LEADER';

The query result is as follows. The subpartitions under each partition of the table are evenly distributed to all user log streams based on the round-robin algorithm.

+------------+----------------+-------------------+-------+------+
| table_name | partition_name | subpartition_name | ls_id | zone |
+------------+----------------+-------------------+-------+------+
| tt8        | p0             | p0sp0             |  1001 | z1   |
| tt8        | p0             | p0sp1             |  1002 | z2   |
| tt8        | p0             | p0sp2             |  1003 | z3   |
| tt8        | p0             | p0sp3             |  1001 | z1   |
| tt8        | p0             | p0sp4             |  1002 | z2   |
| tt8        | p0             | p0sp5             |  1003 | z3   |
| tt8        | p1             | p1sp0             |  1002 | z2   |
| tt8        | p1             | p1sp1             |  1003 | z3   |
| tt8        | p1             | p1sp2             |  1001 | z1   |
| tt8        | p1             | p1sp3             |  1002 | z2   |
| tt8        | p1             | p1sp4             |  1003 | z3   |
| tt8        | p1             | p1sp5             |  1001 | z1   |
+------------+----------------+-------------------+-------+------+
12 rows in set

Special user tables

In addition to regular user tables, there are special user tables such as local index tables, global index tables, and replicated tables, each with their own specific allocation rules. The details are as follows:

Local index tables: The partitioning rules are the same as those of the primary table. Each partition is bound to the corresponding partition of the primary table.
Global index tables: By default, they are non-partitioned tables. When created, they are allocated to the user LS with the fewest partitions.
Replicated tables: Replicated tables exist only on broadcast log streams.

Partition balancing

On the basis of LS balancing, the load balancing module uses the transfer feature to scatter and aggregate partition tablets to LSs, thus achieving partition balancing within a tenant. The partition balancing task SCHEDULED_TRIGGER_PARTITION_BALANCE is controlled by the DBMS_BALANCE.TRIGGER_PARTITION_BALANCE subprogram. By default, partition balancing is triggered once daily at 00:00. For more information about partition balancing operations, see Configure a scheduled partition balancing task.

The priority of partition balancing strategies is as follows:

Partition attribute alignment > Table group and partition weight balancing > Partition count balancing > Partition disk balancing

Partition alignment

Table Group alignment

When you use an SQL statement to modify the table group attribute of a table or the sharding attribute of a table group, or the SCOPE attribute (supported in V4.4.2 BP1 and later) of a table group, you must wait for the background partition balancing task to complete before the partition distribution meets your expectations. You can manually call the DBMS_BALANCE.TRIGGER_PARTITION_BALANCE subprogram to trigger partition balancing for quick alignment.

For more information about the partition distribution rules of tables in a table group, see the Specify user tables for a table group section.

For more information about table groups, see Table groups.

duplicate_scope alignment

In the current version of OceanBase Database, changes to the attributes of replicated tables are supported. Replicated tables can only exist on broadcast log streams. After modifying the duplicate_scope attribute, you must wait for partition balancing to complete before using the attribute as expected. You can manually call the DBMS_BALANCE.TRIGGER_PARTITION_BALANCE subprogram to trigger partition balancing for quick alignment.

For more information about how to modify the attributes of a replicated table, see Change the attributes of a replicated table (MySQL mode) and Change the attributes of a replicated table (Oracle mode).

Table group weight balancing

In MySQL mode of OceanBase Database, user tables can be aggregated by binding a database to a table group with Sharding = 'NONE'. Currently, automatic aggregation of user tables within the same database and across multiple databases is supported.

By setting weights for table groups with Sharding = 'NONE', table group weight balancing helps distribute tables in the table group according to their weights.

Table group weight balancing is a process in the partition balancing (PARTITION_BALANCE) task. You can trigger it manually or wait for the scheduled partition balancing task to run.

Partition weight balancing (non-table group)

Partition weight is the relative proportion of resources such as CPU, memory, and disk occupied by different partitions set by users. Only integer values are supported, and the value range is [1, +∞). By default, partitions do not have weights.

Partition weight balancing is a process in partition balancing tasks. After users set partition weights for a table, they can manually trigger a partition balancing task or wait for a scheduled partition balancing task to be triggered. The partition weight balancing algorithm uses partition movement and exchange to minimize the variance of the sum of partition weights across different user log streams.

Only tables with partition weights set will participate in partition weight balancing. For partitions without weights set, balancing will still be performed based on the number of partitions.

Use cases of partition weight balancing

Partition weight balancing is mainly used for hot partition distribution in non-partitioned tables. It supports the following two scenarios:

Hotspot partition scattering within a partitioned table
Hot partition distribution across non-partitioned tables

Note

Partition weight balancing is not supported for subpartitioned tables.
Partition weight balancing across a partitioned table and a non-partitioned table is considered as inter-table weight balancing.

When you set partition weights, we recommend that you set at most three weight levels:

High weight = 100% × number of partitions
Medium weight = 50% × number of partitions
Low weight = 1

The following examples describe how to use partition weight balancing.

Hotspot partition scattering within a partitioned table

For example, assume a partitioned table t1 with 6 partitions. After the table is created, the partition distribution is as follows.

Initial distribution of a partitioned table

The following scenarios are described:

Scenario 1: Disperse a few hot partitions

Assume that only p0 and p3 in table t1 are hotspot partitions that need to be scattered. To minimize the number of weight tiers, you can use only one tier: "small weight = 1".

Set the weights of p0 and p3 to 1. During partition weight balancing, only these two partitions are balanced. The other partitions are not involved in the balancing. After partition weight balancing (manual or automatic), the partitions are distributed as shown in the following figure.
Scenario 2: Distribute weights to all partitions

Assume that p0 in the partitioned table t1 has the highest traffic, p1, p2, and p3 have medium traffic, and the rest have low traffic. To scatter traffic, use the following three weight tiers:
- Large weight = 100% × partition count of the partitioned table = 6
- Medium weight = 50% × partition count of the partitioned table = 3
- Low weight = 1
Set the table-level partition weight to 1 to define the scope of partition weight balancing for the entire partitioned table, and then set the partition weights to p0 = 6, p1 = 3, p2 = 3, and p3 = 3. After partition weight balancing, the partitions are distributed as shown in the following figure.
Scenario 3: Isolate hot partitions

Assume that the 6 partitions in table t1 have highly uneven traffic, with two oversized partitions p0 and p3. You want other partitions in the table to not affect the read and write performance of these two. Use the following two weight tiers:
- Large weight = 100% × partition count of the partitioned table = 6
- Low weight = 1
Set the table-level partition weight to 1 to define the scope of partition weight balancing for the entire partitioned table, and then set the partition weights to p0 = 6 and p3 = 6. After partition weight balancing, the partitions are distributed as shown in the following figure.

Hot partition distribution in non-partitioned tables

For example, assume that you have six non-partitioned tables. After the tables are created, the tables are distributed as shown in the following figure.

Initial distribution of non-partitioned tables

The following scenarios are described:

Scenario 1: Disperse a few hot partitions

Assume that the non-partitioned tables non_part_t1 and non_part_t4 are hot. To disperse the hot partitions, you can set the weight of these two tables to 1.

Set the table-level weights of non_part_t1 and non_part_t4 to 1. After partition weight balancing, the tables are distributed as shown in the following figure.
Scenario 2: Distribute weights to all tables

Assume that the traffic of each non-partitioned table is known: non_part_t1 is highest, non_part_t2 and non_part_t3 are medium, and non_part_t4, non_part_t5, and non_part_t6 are lowest. To scatter by traffic, use the following three tiers:
- High weight = 100% × number of partitions = 6
- Medium weight = 50% × number of partitions = 3
- Low weight = 1
Set the table-level partition weights to non_part_t1 = 6, non_part_t2 = 3, non_part_t3 = 3, non_part_t4 = 1, non_part_t5 = 1, and non_part_t6 = 1. After partition weight balancing, the tables are distributed as shown in the following figure.

Impact of partition weights on a table group

A table with partition weights can be added to a table group. For a table group, if a table with partition weights is added to the table group, partition balancing is performed based on the weighted sum in the partition group. The following table describes the impact of adding a table with partition weights to a table group.

V4.4.2 BP1 (excluding) and earlier

Sharding attribute of the table group	Original partition distribution	Impact after adding partition weights
NONE	All aggregated	No impact
PARTITION	Partitions with the same partition value are aggregated (same partition group); partitions are scattered.	A few weighted partitions affect the overall distribution of the table group.
ADAPTIVE	When every table in the group is a partitioned table (without subpartitions), partitions with the same partition value are aggregated (same partition group); partitions are scattered. When every table in the group is subpartitioned, subpartitions with the same partition and subpartition values are aggregated (same partition group); subpartitions are scattered.	When every table in the group is a partitioned table (without subpartitions), a few weighted partitions affect the overall distribution. Partition weights for subpartitioned tables are not supported in the current version.

The following example illustrates the impact of setting partition weights for a table in a table group on the overall distribution.

Assume a table group with SHARDING = PARTITION that contains two partitioned tables t1 and t2, each with 6 partitions. The partitions of each table form a partition group. The initial distribution is as follows.

Initial distribution of partitions in a table group

Set the weight of all partitions in table t1 to 1, then set t1_p0 = 6 and t1_p1 = 6. After partition balancing, the partition distribution in the table group is as follows.

Distribution after adding partition weights to a table group

V4.4.2 BP1 and later

Sharding attribute and scope of the table group	Original partition distribution	Impact after adding partition weights
`SHARDING = 'NONE'` + `SCOPE = 'SERVER'`	All aggregated	No impact
`SHARDING = 'PARTITION'` + `SCOPE = 'CLUSTER'`	Partitions with the same partition value are aggregated (same partition group); partitions are scattered.	A few weighted partitions affect the overall distribution of the table group.
`SHARDING = 'ADAPTIVE'` + `SCOPE = 'CLUSTER'`	When every table in the group is a partitioned table (without subpartitions), partitions with the same partition value are aggregated (same partition group); partitions are scattered. When every table in the group is subpartitioned, subpartitions with the same partition and subpartition values are aggregated (same partition group); subpartitions are scattered.	When every table in the group is a partitioned table (without subpartitions), a few weighted partitions affect the overall distribution. Partition weights for subpartitioned tables are not supported in the current version.

Note

SHARDING = 'NONE' + SCOPE = 'ZONE' and SHARDING = 'NONE' + SCOPE = 'CLUSTER' do not support partition or table weights.

Partition quantity balancing

The goal of partition quantity balancing is to ensure that the number of primary table partitions on each user LS is as balanced as possible (with a deviation of no more than 1).

In OceanBase Database, the concept of "balance groups" is used to describe the distribution of partitions. Partitions that need to be distributed are grouped into the same balance group. Then, partition count balance is achieved through intra-group and inter-group balancing.

In the scenario without table groups, the balance groups are divided as follows:

Table type	Balance group division	Distribution method (without table groups)
Non-partitioned table	All non-partitioned tables under the tenant are in one balance group.	All non-partitioned tables under the tenant are evenly distributed across all LSs of the users (with a deviation of no more than 1).
Partitioned table	All partitions of each table are in one balance group.	The partitions of each table are evenly distributed across all LSs of the users.
Subpartitioned table	All subpartitions under each partition of a single table are in one balance group.	All subpartitions under each partition of a single table are evenly distributed across all LSs of the users.

During the balancing phase, the system first performs intra-group balancing, which evenly distributes the partitions within each balance group. Then, based on the intra-group balancing results, the system transfers some partitions from the LS with the most partitions to the LS with the fewest partitions to achieve inter-group balancing, thereby achieving overall partition count balance.

For example, suppose there are four tables with all their partitions located on LS1, divided into three balance groups:

Balance group 1: non_part_t1 and non_part_t2 (non-partitioned tables)
Balance group 2: part_one_t3_p0 and part_one_t3_p1 (partitions of a partitioned table)
Balance group 3: part_two_t4_p0s0, part_two_t4_p0s1, part_two_t4_p1s0, and part_two_t4_p1s1 (subpartitions of a subpartitioned table)

Initially, the partitions are distributed as 8-0-0.

Initial distribution

Upon intra-group balancing, the distribution of partitions is 4-4-0. Each balancing group is balanced, but the tenant overall is not.

Distribution upon intra-group balancing

After inter-group balancing, the partitions are distributed as 3-3-2. Partitions are transferred from LS1 and LS2, which have the most partitions, to LS3, which has the fewest partitions, achieving overall balance.

Distribution upon inter-group balancing

When table groups exist, each table group is an independent balancing group. The system binds the partitions to be aggregated in a table group and takes them as one partition. Then the system adopts the same intra-group balancing and inter-group balancing methods to achieve partition quantity balancing. When table groups exist, partition quantity balancing may not be fully achieved in a tenant.

When table groups exist, balance groups are divided as follows:

Table type	Balance group division	Distribution method
Table with `NONE` as the Sharding attribute	Forms its own balance group	All partitions of the table are distributed on one log stream.
Partitioned table with `PARTITION` or `ADAPTIVE` as the Sharding attribute	All partitions of all tables are in one balance group	The partitions of the first table are evenly distributed across all log streams. The positions of the partitions of the subsequent tables are aligned with those of the first table.
Subpartitioned table with `ADAPTIVE` as the Sharding attribute	All subpartitions under all partitions of all tables are in one balance group	The subpartitions under each partition of the first table are evenly distributed across all log streams. The positions of the subpartitions of the subsequent tables are aligned with those of the first table.

For example, suppose we add a new table group tg1 with sharding = 'NONE' containing five non-partitioned tables: non_part_t5_in_tg1, non_part_t6_in_tg1, non_part_t7_in_tg1, non_part_t8_in_tg1, and non_part_t9_in_tg1. Since all tables in tg1 must be bound together, the partitions of these tables are distributed as 4-4-5 after balancing.

Distribution with table groups

`SCOPE = 'ZONE'` table groups

For OceanBase Database V4.4.2, starting from V4.4.2 BP1, table groups support the SCOPE attribute. The distribution of partitions within a table group is determined by both the SHARDING and SCOPE attributes. The SHARDING attribute specifies the aggregation method for partitions across tables within the table group. A group of aggregated partitions is called a partition group (Partition Group), which is the smallest unit for distribution. The SCOPE attribute determines the distribution range of all aggregated Partition Groups within the table group.

Quantity balancing

For SCOPE = 'ZONE' (i.e., SHARDING = 'NONE' + SCOPE = 'ZONE'), the number of table groups in each zone is balanced as much as possible. All partitions in a table group are evenly distributed across log streams by round robin among log streams that share the same Primary Zone configuration.

Below is a simple example to demonstrate the distribution effect of SCOPE = 'ZONE' table groups.

Assume a tenant has a locality of FULL{1}@z1, FULL{1}@z2 (2F1A), with UNIT_NUM = 2 and PRIMARY_ZONE = 'z1,z2'. The tenant has the following tables:

Non-partitioned tables: nonpart_1, nonpart_2, nonpart_3, nonpart_4
Partitioned tables: part_1, part_2
Subpartitioned tables: subpart_1, subpart_2
Global index tables: global_index_1, global_index_2, global_index_3, global_index_4

Create two table groups, tablegroup1 and tablegroup2, in the tenant.

obclient> CREATE TABLEGROUP tablegroup1 SHARDING = 'NONE', SCOPE = 'ZONE';

obclient> CREATE TABLEGROUP tablegroup2 SHARDING = 'NONE', SCOPE = 'ZONE';

Then, add the corresponding tables to these two table groups.

obclient> ALTER TABLEGROUP tablegroup1 ADD TABLE nonpart_1,nonpart_2,part_1,subpart_1,global_index_1, global_index_2;

obclient> ALTER TABLEGROUP tablegroup2 ADD TABLE nonpart_3,nonpart_4,part_2,subpart_2,global_index_3, global_index_4;

After partition balancing, the distribution effect is as follows:

Distribution effect of SCOPE = 'ZONE' table groups

Weight balancing

For SCOPE = 'ZONE' table groups, you can set weights (supported only in MySQL-compatible mode) to disperse hot table groups across different zones.

Below is an example. Assume the tenant has six SHARDING = 'NONE' and SCOPE = 'ZONE' table groups without weights. The distribution is as follows.

Distribution without weights

Scenario 1: Disperse a small number of hot table groups

Assume TG1 and TG4 are two hot table groups in the same zone and need to be dispersed. To minimize the number of weight levels, you can use only one weight level "Small Weight = 1" to solve the issue.

After setting the weights of TG1 and TG4 to 1, the distribution is as follows.
Scenario 2: Spread by traffic

Assume the user knows the traffic of each table group. TG1 has the highest traffic, followed by TG2 and TG3, and TG4, TG5, and TG6 have the lowest traffic. The user wants table groups spread according to traffic.

You can use three weight levels:
1. Large Weight = 100% * Expected Number of Weighted Table Groups = 6
2. Medium Weight = 50% * Expected Number of Weighted Table Groups = 3
3. Small Weight = 1
After setting the weights of TG1 to 6, TG2 to 3, TG3 to 3, TG4 to 1, TG5 to 1, and TG6 to 1, the distribution after partition balancing is as follows.

Disk balancing

For SCOPE = 'ZONE' table groups, the system automatically attempts to exchange table groups between zones to balance the total disk usage across zones, while maintaining the quantity and weight balance of SCOPE = 'ZONE' table groups in each zone.

Table group exchange between zones is initiated only when the total disk usage difference between zones meets certain conditions. The conditions are as follows:

The total disk usage difference of the table groups exceeds the total disk usage of the largest zone multiplied by zone_disk_balance_tolerance_percentage/100, with a default value of 10%.

The tenant-level parameter zone_disk_balance_tolerance_percentage controls the tolerance level of the disk balancing algorithm for imbalance between zones. The value range is [0, 100]. When the value is 100, zone disk balancing is disabled.
The total disk usage of the largest zone exceeds 50 GB * Number of Log Stream Groups.

For the table groups to be exchanged, the following conditions must also be met:

The weights of the table groups are the same.
The exchange reduces the disk usage difference between the two zones.

Below is an example. Assume there are six SCOPE = 'ZONE' table groups without weights. After table group quantity balancing, the distribution is as follows. At this point, ZONE3 has the highest disk usage.

Before table group disk balancing

After triggering disk balancing, the system first exchanges the table groups TG3 and TG2 in ZONE3 and ZONE2, which have the highest and lowest disk usage, respectively. The distribution after the exchange is as follows.

After balancing

Then, the system exchanges the table groups TG2 and TG4 in ZONE1 and ZONE3. The distribution after the exchange is as follows.

After balancing again

At this point, no further exchanges can be made, and the table group disk balancing process ends.

Partition disk balancing

Based on the number or weight of partitions, the load balancing module exchanges partitions as much as possible to ensure that the disk usage difference between log streams does not exceed the percentage specified by the cluster-level parameter balancer_tolerance_percentage. If a single partition's disk usage is too large, this balancing effect may not be achieved.

To avoid excessive balancing in scenarios with small data volumes, the current version triggers partition disk balancing only when the disk usage of a single log stream exceeds "50GB". If the disk usage is below this threshold, balancing is not triggered.

How to verify partition balancing

Partition balance refers to the balance of user tables. When querying, you must specify table_type='USER TABLE'. Partition disk balance is mainly determined by the data_size field in the CDB_OB_TABLET_REPLICAS view.

Query method for partition count balance

obclient [test]> SELECT svr_ip,svr_port,ls_id,count(*) FROM oceanbase.CDB_OB_TABLE_LOCATIONS WHERE tenant_id=xxx AND role='leader' AND table_type='USER TABLE' GROUP BY svr_ip,svr_port,ls_id;

Query method for partition disk balance

obclient [test]> SELECT a.svr_ip,a.svr_port,b.ls_id,sum(data_size)/1024/1024/1024 as total_data_size FROM oceanbase.CDB_OB_TABLET_REPLICAS a, oceanbase.CDB_OB_TABLE_LOCATIONS b WHERE a.tenant_id=b.tenant_id AND a.svr_ip=b.svr_ip AND a.svr_port=b.svr_port AND a.tablet_id=b.tablet_id AND b.role='leader' AND b.table_type='USER TABLE' AND a.tenant_id=xxxx GROUP BY svr_ip,svr_port,ls_id;

Scenarios after sharding reorganization

Partition balancing is a continuous process to dynamically maintain system load balance. In actual business operations, the system will aggregate and reorganize partitions based on real-time load status to achieve the optimal balance between resource utilization efficiency and performance. Common scenarios after sharding reorganization include:

Horizontal scaling in and out of a tenant. For example, changing the number of UNIT_NUM from N to 1 and then back to N.
Changing the number of primary zones with the highest priority for a tenant. For example, changing the PRIMARY_ZONE from RANDOM to zone1 and then back to RANDOM.
Adding a large number of tables with the Sharding attribute set to NONE to a table group and then removing them from the table group.

Based on the existing log stream balancing and partition balancing algorithms, after sharding reorganization, there may be situations where consecutive partitions are aggregated on the same log stream or non-partitioned tables in the same database are aggregated on the same log stream. To address this issue, OceanBase Database has optimized the partition balancing algorithm:

For non-partitioned tables, after sharding reorganization, multiple non-partitioned tables in the same database are evenly distributed across different user log streams based on the number of partitions, avoiding consecutive non-partitioned tables in the same log stream.
For partitioned tables, after sharding reorganization, consecutive partitions of the same partitioned table are evenly distributed across different user log streams in a round-robin manner, avoiding consecutive partitions in the same log stream.

OceanBase

Customer Stories

Documentation

Intra-tenant balancing

Notice

LS balancing (homogeneous zone mode)

LS number balancing

Note

Note

LS leader balancing

LS balancing (heterogeneous zone mode)

Partition distribution during table creation

User tables in a table group

Notice

User tables

Special user tables

Partition balancing

Partition alignment

Table group weight balancing

Partition weight balancing (non-table group)

Use cases of partition weight balancing

Note

Impact of partition weights on a table group

Note

Partition quantity balancing

`SCOPE = 'ZONE'` table groups

Quantity balancing

Weight balancing

Disk balancing

Partition disk balancing

How to verify partition balancing

Scenarios after sharding reorganization