Load balancing is an important part of performance tuning. It encompasses load balancing among physical servers in a cluster and load balancing of business traffic in a cluster. A sound load balancing status ensures optimal performance by fully utilizing software and hardware resources. During a stress test, pay attention to the resource usage of each OBServer node in the cluster, including the CPU, I/O, and load. This topic describes load balancing in terms of cluster deployment and resource distribution.
Cluster deployment
Collect locations, latency, and bandwidth information for cluster deployment.
Location
Location information, such as the IDC and deployment methods, is essential to SQL route forwarding, transaction models, and performance. Location information includes the following parts:
Deployment method: three IDCs in the same city, three IDCs across two regions, five IDCs across three regions, or other deployment methods.
Location of OceanBase Database Proxy (ODP) and other middleware: on the client or on the same server as the OBServer node. Different deployment methods have some effect on performance.
Location of application servers and other middleware.
Run the following SQL statement to check the IDC and region where each zone in the cluster is located:
MySQL [oceanbase]> select zone, name, info from __all_zone where name in ('region', 'idc') ;
+------+--------+------+
| zone | name | info |
+------+--------+------+
| z1 | idc | |
| z1 | region | SZ |
| z2 | idc | |
| z2 | region | SZ |
| z3 | idc | |
| z3 | region | SZ |
+------+--------+------+
6 rows in set (0.00 sec)
Latency
You can use the latency information to evaluate whether the response time of a single SQL statement meets your expectations. The latency information of a cluster includes:
The latency between IDCs
The latency between zones
The latency between ODP and an OBServer node
The latency between a client and ODP
Bandwidth
Check the bandwidths of the following components:
NIC bandwidth on the server where ODP is deployed
NIC bandwidth on the application server
NIC bandwidth and disk I/O bandwidth on the OBServer node
You can use commands such as ping, tsar, ethtool xxx, and ifconfig to obtain the information. The following section uses FFF deployment with three replicas in three IDCs in the same city as an example.
Resource distribution
Obtain information about tenant resource distribution for subsequent performance diagnostics.
Basic information about a tenant
The information includes the primary zone and locality. You can run the following SQL statement to obtain the information:
MySQL [oceanbase]> select * from __all_tenant limit 1\G;
*************************** 1. row ***************************
gmt_create: 2021-08-25 20:38:18.699528
gmt_modified: 2021-08-25 20:38:18.699528
tenant_id: 1
tenant_name: sys
replica_num: -1
zone_list: z1;z2;z3
primary_zone: z1;z2,z3
locked: 0
collation_type: 0
info: system tenant
read_only: 0
rewrite_merge_version: 0
locality: FULL{1}@z1, FULL{1}@z2, FULL{1}@z3
logonly_replica_num: 0
previous_locality:
storage_format_version: 0
storage_format_work_version: 0
default_tablegroup_id: -1
compatibility_mode: 0
drop_tenant_time: -1
status: TENANT_STATUS_NORMAL
in_recyclebin: 0
1 row in set (0.01 sec)
Resource allocation information
Sample SQL statements:
MySQL [oceanbase]> select * from gv$unit where tenant_id=1002 limit 1 \G;
*************************** 1. row ***************************
unit_id: 1004
unit_config_id: 1002
unit_config_name: box1
resource_pool_id: 1002
resource_pool_name: fpool
zone: z1
tenant_id: 1002
tenant_name: tt1
svr_ip: xx.xx.xx.65
svr_port: 40000
migrate_from_svr_ip:
migrate_from_svr_port: 0
max_cpu: 4
min_cpu: 4
max_memory: 24604378624
min_memory: 24604378624
max_iops: 128
min_iops: 128
max_disk_size: 536870912
max_session_num: 64
1 row in set (0.01 sec)
Total number of user partitions on a server and leader distribution
MySQL [oceanbase]> select svr_ip,count(1) from __all_virtual_meta_table where tenant_id=1002 group by svr_ip;
+---------------+----------+
| svr_ip | count(1) |
+---------------+----------+
| xx.xx.xx.65 | 1 |
| xx.xx.xx.66 | 1 |
| xx.xx.xx.67 | 1 |
+---------------+----------+
3 rows in set (0.01 sec)
MySQL [oceanbase]> select svr_ip,count(1) from __all_virtual_meta_table where tenant_id=1001 and role=1 group by svr_ip;
+---------------+----------+
| svr_ip | count(1) |
+---------------+----------+
| xx.xx.xx.65 | 5 |
+---------------+----------+
1 row in set (0.00 sec)
Other parameters
Throughout the process starting from the application server that initiates a request to the OBServer application server, any involved component deserves our attention. Any bottleneck on a component can affect the performance. Therefore, pay attention to:
Physical resources. Check for resource bottlenecks on each component in the intermediate link, such as the JVM memory, CPU utilization of the application server and ODP, and software interrupts.
Request routing. Check whether ODP can correctly route SQL requests and whether unauthorized forwarding occurs.
Connection pool. Check the numbers of long and short connections and the SocketTimeout parameter.
Traffic balancing. Check whether the number of SQL statements received and processed by each OBServer node is seriously unbalanced.