Load balancing is a key consideration in performance optimization and involves two aspects: load balancing within the cluster's physical servers and load balancing for business traffic. An optimal load balancing state ensures that both software and hardware resources are fully utilized, leading to peak performance. During stress testing, we need to monitor the resource usage of all OBServer nodes in the cluster, including parameters such as CPU, I/O, and load. This article covers the deployment of clusters and resource distribution.
Cluster deployment
For cluster deployment, you need to collect information about location, latency, and bandwidth.
Location
Location information is crucial. Key information such as IDC and deployment method affects SQL routing, transaction model, and performance. Here are the key points:
Deployment option: deployment within the same city across three IDCs, deployment between two cities with three data centers each, deployment across three cities with five data centers in total, or other deployment options.
Location of OBProxy and other middleware: Do you deploy the OBProxy and other middlewares on the client side or in the same physical environment as the database server? The deployment methods have different effects on the performance.
The location of the application servers and other middleware.
You can execute the following SQL query to view the IDCs corresponding to each zone of the cluster, the cities where these IDCs are located, and their configurations:
obclient(root@sys)[oceanbase]> SELECT * FROM oceanbase.DBA_OB_ZONES;
The query result is as follows:
+-------+----------------------------+----------------------------+--------+------+----------------+-----------+
| ZONE | CREATE_TIME | MODIFY_TIME | STATUS | IDC | REGION | TYPE |
+-------+----------------------------+----------------------------+--------+------+----------------+-----------+
| zone1 | 2025-12-29 15:43:43.200680 | 2025-12-29 15:43:43.200680 | ACTIVE | | default_region | ReadWrite |
+-------+----------------------------+----------------------------+--------+------+----------------+-----------+
1 row in set
Latency
You can evaluate whether the response time (RT) of a single SQL statement conforms to the expectation based on the latency information. The specific delay information of the entire cluster is as follows:
Inter-IDC latency;
Inter-zone latency;
OBProxy to OBServer latency;
Client to OBProxy latency.
Bandwidth
Verify the bandwidth of the following components:
Bandwidth of the network card of the machine where OBProxy is located.
Application server NIC bandwidth.
NICs and disk I/O bandwidth of OBServer nodes.
You can use the following commands to query these values: ping, tsar, ethtool, and ifconfig. This example uses three IDCs and three replicas in a three-node FFF cluster.
Resource distribution
Understand the distribution of tenant resources to prepare for performance diagnostics.
Basic information for a tenant
They include Primary Zone, Locality, and so on. The related SQL statements are as follows:
obclient(root@sys)[oceanbase]> SELECT * FROM oceanbase.DBA_OB_TENANTS LIMIT 1\G
The query result is as follows:
*************************** 1. row ***************************
TENANT_ID: 1
TENANT_NAME: sys
TENANT_TYPE: SYS
CREATE_TIME: 2025-12-29 15:43:42.930290
MODIFY_TIME: 2025-12-29 15:43:42.930290
PRIMARY_ZONE: RANDOM
LOCALITY: FULL{1}@zone1
PREVIOUS_LOCALITY: NULL
COMPATIBILITY_MODE: MYSQL
STATUS: NORMAL
IN_RECYCLEBIN: NO
LOCKED: NO
TENANT_ROLE: PRIMARY
SWITCHOVER_STATUS: NORMAL
SWITCHOVER_EPOCH: 0
SYNC_SCN: NULL
REPLAYABLE_SCN: NULL
READABLE_SCN: NULL
RECOVERY_UNTIL_SCN: NULL
LOG_MODE: NOARCHIVELOG
ARBITRATION_SERVICE_STATUS: DISABLED
UNIT_NUM: 1
ZONE_UNIT_NUM_LIST: zone1:1
COMPATIBLE: 4.4.2.0
MAX_LS_ID: 1001
RESTORE_DATA_MODE: NORMAL
FLASHBACK_LOG_SCN: NULL
COMMENT: system tenant
1 row in set
Resource allocation information
The related SQL is as follows:
obclient(root@sys)[oceanbase]> SELECT * FROM oceanbase.GV$OB_UNITS LIMIT 1\G
The query results are as follows:
*************************** 1. row ***************************
SVR_IP: 172.xx.xxx.xxx
SVR_PORT: 2882
UNIT_ID: 1
TENANT_ID: 1
ZONE: zone1
ZONE_TYPE: ReadWrite
REGION: default_region
MAX_CPU: 4
MIN_CPU: 4
MEMORY_SIZE: 5368709120
MAX_IOPS: 9223372036854775807
MIN_IOPS: 9223372036854775807
IOPS_WEIGHT: 4
MAX_NET_BANDWIDTH: 9223372036854775807
NET_BANDWIDTH_WEIGHT: 4
LOG_DISK_SIZE: 9663676416
LOG_DISK_IN_USE: 3497610993
DATA_DISK_SIZE: NULL
DATA_DISK_IN_USE: 216748032
STATUS: NORMAL
CREATE_TIME: 2025-12-29 15:43:38.947558
REPLICA_TYPE: FULL
1 row in set
Total user partitions and Leader distribution on single server
obclient [oceanbase]> SELECT svr_ip,count(1) FROM oceanbase.__all_virtual_ls_meta_table WHERE tenant_id=1002 GROUP BY svr_ip;
The query result is as follows:
+---------------+----------+
| svr_ip | count(1) |
+---------------+----------+
| 10.10.10.1 | 1 |
| 10.10.10.2 | 1 |
| 10.10.10.3 | 1 |
+---------------+----------+
3 rows in set
The leader distribution:
obclient [oceanbase]> SELECT svr_ip,count(1) FROM oceanbase.__all_virtual_ls_meta_table WHERE tenant_id=1001 and role=1 GROUP BY svr_ip;
The query results are as follows:
+---------------+----------+
| svr_ip | count(1) |
+---------------+----------+
| 10.10.10.1 | 5 |
+---------------+----------+
1 row in set
Other
Any component in the request path from the application server to the OBServer node can become a performance bottleneck. Ensure that:
Physical resources: The resources of each component in the intermediate chain are within the limits. For example: the JVM memory, CPU usage of application servers and OBProxy, and soft interrupts.
Request routing: Check whether OBProxy correctly routes SQL requests and avoids improper forwarding.
Connection Pool: the number of long and short connections, and SocketTimeout.
Traffic balance: whether the number of SQL statements processed by each OBServer is highly imbalanced.