With the rapid advancements in the era of big data, databases, serving as the core components of data storage and management, are encountering unprecedented challenges. As enterprise business scales grow and technological architectures become more complex, efficiently managing and optimizing vast amounts of data has become a pressing issue. OceanBase Database, a distributed relational database system independently developed by the OceanBase team, has demonstrated exceptional capabilities in tackling these challenges. Since its launch, OceanBase Database has gained widespread recognition for its high availability, high performance, and high scalability.
Starting from V4.0, OceanBase Database has adopted an integrated architecture that supports both standalone and distributed deployments. Additionally, it introduced the concept of adaptive log streams to meet diverse data load balancing requirements, thereby enhancing the system's overall flexibility and efficiency.
Building on this foundation, OceanBase Database has designed and implemented a series of load balancing strategies to address the varying needs for data distribution and access speed across different scenarios. These strategies are primarily categorized into two types: inter-tenant balancing strategies and intra-tenant balancing strategies. Inter-tenant strategies focus on redistributing data across multiple tenants to ensure resources are utilized efficiently. Intra-tenant strategies, on the other hand, aim to optimize data layout within a single tenant, improving the performance of specific applications or services.
This topic will delve into how these two types of load balancing strategies are applied in practice and examine their specific impacts on related business modules. Additionally, we will share practical performance tuning recommendations to help you better understand and configure the most suitable load balancing solution for your data management and usage needs. Through this, we aim to provide valuable insights for developers and technical support engineers, enabling them to make informed decisions when facing complex database environments.
Inter-tenant balancing
Scenario: tenant creation
OceanBase Database adopts a multi-tenant architecture, where each tenant is allocated its own resource unit, including CPU, memory, and disk resources with consistent specifications. Resource load balancing is performed across tenants. When creating a resource unit before tenant creation, the system evaluates the available resources on each server by calculating weights based on their CPU, memory, and disk usage. The new resource unit is then allocated to the server with the most remaining resources.
Dynamic load balancing is also applied to tenant resource units. For example, if the disk usage of a server exceeds the threshold defined by the cluster-level parameter server_balance_critical_disk_waterlevel, the system automatically migrates resource units from the overloaded server to one with lower disk usage, ensuring disk space is balanced. Similarly, when overall resource usage is low, the system calculates the load on each server based on CPU and memory weights. It then migrates resource units from high-load servers to low-load servers to achieve CPU and memory load balancing.
Enable or disable inter-tenant balancing
You can enable or disable inter-tenant balancing by configuring the enable_rebalance parameter in the sys tenant. By default, inter-tenant balancing is enabled.
Follow these steps to check whether inter-tenant balancing is enabled and to enable it if it is currently disabled:
Run the following command in the
systenant to check the value of theenable_rebalanceparameter:obclient> SHOW PARAMETERS LIKE '%enable_rebalance%';If inter-tenant balancing is disabled, execute the following command to enable it:
obclient> ALTER SYSTEM SET enable_rebalance = true;
Impact of inter-tenant balancing on business
Inter-tenant balancing is typically performed during the initialization of clusters or tenants and is rarely triggered in OceanBase Database V4.x. As such, its impact on ongoing business operations is minimal.
Intra-tenant balancing
Intra-tenant balancing includes log stream (LS) balancing and partition balancing.
LS balancing
Scenario 1: scaling
OceanBase Database offers highly flexible scaling capabilities. By increasing or decreasing the number of resource units or adjusting the number of primary zones with top priority for a tenant, you can modify the number of service nodes. Any change in the number of service nodes triggers LS balancing. Through actions such as LS splitting, LS merging, LS replica migration, LS leader switching, and partition balancing, OceanBase Database redistributes LSs and leaders to achieve an optimal state for the tenant. In this ideal state, each resource unit within the tenant has an LS replica, and each top-priority primary zone contains exactly one LS leader.
Scenario 2: disaster recovery
OceanBase Database ensures high reliability and availability through disaster recovery operations, including permanently taking a node offline, aligning locality, migrating resource units, and scaling in resource units. These operations also trigger LS balancing to maintain system stability and availability.
Partition balancing
Scenario 1: scaling
After a scaling operation is successfully completed, the system performs LS balancing. Based on this, the load balancing module redistributes partitions by either scattering or consolidating partition tablets across different LSs. This process ensures partition balancing within the tenant.
Scenario 2: dynamic table change (table creation or dropping)
When tables and partitions are dynamically created and dropped, the number of partitions on each server node may vary drastically. In this case, partition balancing is required.
When you create a user table, OceanBase Database selects a balancing strategy based on the table type to scatter or aggregate partitions to LSs, so as to ensure partition balancing among LSs. For more information about table types and corresponding balancing strategies, see Intra-tenant balancing.
By dividing partitions to be scattered into balancing groups, OceanBase Database implements partition quantity balancing within a group and across groups, and exchanges partitions to achieve partition disk balancing. If you want to aggregate or scatter certain partitions in actual business scenarios, you need to manually adjust the distribution of partitions. For more information, see Transfer a partition.
Scenario 3: attribute change of a replicated table
OceanBase Database supports replicated tables since V4.2.0. Replicated tables exist only on broadcast LSs. OceanBase Database V4.2.3 and later (excluding V4.3.x) allow you to change the replicated table attribute If you change the replicated table attribute, you can use the table based on the new attribute value only after partition balancing is completed.
Impact of intra-tenant balancing on business
The impact of intra-tenant balancing on business is described as follows:
Frequent scaling will create and delete LSs, resulting in many log streams in the system, including LSs without garbage collection (GC). The log disk of the system must have sufficient space. Otherwise, the scaling or load balancing process will be stuck.
If you increase the
UNIT_NUMvalue of a tenant from1to2, the transactions per second (TPS) decrease by 50% and the decrease lasts 1 minute, leading to an overlap with a partition transfer task. If you decrease theUNIT_NUMvalue of a tenant from2to1, the TPS is stable and fluctuates within 5% upon each performance jitter.If the disk I/O usage becomes very high due to factors such as major compactions, which can reach 99% according to tests, the execution and scheduling of transfer tasks will become very slow. As LSs will be blocked during partition transfer, performance jitters may occur. The duration of each performance jitter equals the value of the hidden parameter
_transfer_start_trans_timeout. For jitter-sensitive business modules, you can adjust the transfer scheduling cycle as needed.The
_transfer_start_trans_timeoutparameter specifies the timeout period for starting a transaction when a transfer task starts. The value range is [1ms, 600s] and the default value is1s. Perform the following steps to query and configure this parameter:Log in to a MySQL or Oracle tenant of the cluster as the administrator of the tenant. By default, the administrator is the
rootuser in MySQL mode andSYSuser in Oracle mode.Check the value of the
_transfer_start_trans_timeoutparameter.MySQL modeOracle modeobclient> SELECT * FROM oceanbase.GV$OB_PARAMETERS WHERE NAME LIKE '%_transfer_start_trans_timeout%';obclient> SELECT * FROM SYS.GV$OB_PARAMETERS WHERE NAME LIKE '%_transfer_start_trans_timeout%';Modify the value of the
_transfer_start_trans_timeoutparameter.MySQL modeOracle modeHere is an example in MySQL mode:
obclient> ALTER SYSTEM SET _transfer_start_trans_timeout = '1s';Here is an example in Oracle mode:
obclient> ALTER SYSTEM SET "_transfer_start_trans_timeout" = '1s';
Strategies for processing active transactions during load balancing
OceanBase Database does not support active transactions during a transfer task for load balancing. It provides two strategies for processing active transactions:
The transfer task is performed after the active transactions are completed.
This is the default processing strategy. In peak business hours, the transfer task will wait for business transactions to complete. The wait time is long and subject to the business pressure, number of partitions, and number of LSs. It is expected that transfer tasks are imperceptible to business.
The transfer task proactively kills the active transactions in the corresponding LS.
You need to set
_enable_balance_kill_transactiontotrueto push forward the transfer task. However, business transactions may be killed and rolled back.The hidden parameter
_enable_balance_kill_transactionspecifies whether a load balancing task proactively kills active transactions in the remote LS. The default value isFalse. Perform the following steps to query and configure this parameter:Log in to a MySQL or Oracle tenant of the cluster as the administrator of the tenant. By default, the administrator is the
rootuser in MySQL mode andSYSuser in Oracle mode.Check the value of the
_enable_balance_kill_transactionparameter.MySQL modeOracle modeobclient> SELECT * FROM oceanbase.GV$OB_PARAMETERS WHERE NAME LIKE '%_enable_balance_kill_transaction%';obclient> SELECT * FROM SYS.GV$OB_PARAMETERS WHERE NAME LIKE '%_enable_balance_kill_transaction%';Modify the value of the
_enable_balance_kill_transactionparameter.MySQL modeOracle modeHere is an example in MySQL mode:
obclient> ALTER SYSTEM SET _enable_balance_kill_transaction = true;Here is an example in Oracle mode:
obclient> ALTER SYSTEM SET "_enable_balance_kill_transaction" = true;
You need to select a strategy for active transaction processing based on your business needs and load balancing requirements. Proceed with caution.
Recommended configurations
After you perform scaling or disaster recovery operations in off-peak business hours, if you want the tenant to quickly achieve a balanced state, the recommended configurations are as follows:
For OceanBase Database V4.2.1 to V4.2.3, set both
enable_balanceandenable_transferof the current user tenant totrue, and decrease the value of thepartition_balance_schedule_intervalparameter.For OceanBase Database V4.2.4 and later (excluding V4.3.x), set both
enable_balanceandenable_transferof the current user tenant totrue, and manually trigger a partition balancing task.
If you want to reduce the impact of load balancing on business in peak hours, the recommended configurations are as follows:
To disable LS balancing and partition balancing, set both
enable_balanceandenable_transferof the current user tenant tofalse.To enable load balancing with a low scheduling frequency:
For OceanBase Database V4.2.1 to V4.2.3, set both
enable_balanceandenable_transferof the current user tenant totrue, and increase the value of thepartition_balance_schedule_intervalparameter.For OceanBase Database V4.2.4 and later (excluding V4.3.x), set both
enable_balanceandenable_transferof the current user tenant totrue, and use theTRIGGER_PARTITION_BALANCEsubprogram to trigger scheduled partition balancing tasks.Notice
For a user tenant upgraded from OceanBase Database V4.2.3 or earlier to V4.2.4 or later, the scheduled partition balancing task is disabled by default, which needs to be manually enabled.
To transfer only specific hotspot partitions, set
enable_balanceof the current user tenant tofalseandenable_transferof the tenant totrueand then manually transfer the partitions. For more information about how to manually transfer a partition, see Transfer a partition.