Alert description
This alert monitors whether the number of connections exceeds the limit on an OBServer node. The alert is triggered when the ratio of the current number of connections to a single OBServer node exceeds the threshold compared to the constant value of 256K.
Alerting principle
The following table describes the key parameters involved in the alerting monitoring logic.
| Parameter | Value |
|---|---|
| Monitoring metric | ob_host_connection_percent |
| Metric source | SQL: select/*+ READ_CONSISTENCY(WEAK) */ case when cnt is null then 0 else cnt end as cnt, tenant_name from (select __all_tenant.tenant_name, cnt from __all_tenant left join (select count(*) as cnt, tenant as tenant_name from __all_virtual_processlist where svr_ip = @svr_ip and svr_port=rpc_port() group by tenant) t1 on __all_tenant.tenant_name = t1.tenant_name) t2; active_sessions is the sum of the cnt field values. |
| Collected metric | active_sessions |
| Monitoring expression | 100 * max(active_sessions{metric_group="all_virtual_processlist",@LABELS} / 262144) by (@GBLABELS) |
| Collection interval | 1 second |
The value of the monitoring metric ob_host_connection_percent indicates the percentage of connections on the OBServer node.
When the connection percentage exceeds the threshold (default 80%), or the value of active_sessions exceeds 209715, an alert is triggered.
Rule information
| Monitoring metric | Default threshold (unit: %) | Detection cycle | Elimination cycle |
|---|---|---|---|
| ob_host_connection_percent | 80 | 60 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Expression based on the monitoring metric | Critical | Server |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.1 OceanBase server connection percentage exceeded
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: Connection percentage ${value_shown} exceeds ${alarm_threshold} %.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.1, Alert: Connection percentage 81.0 % exceeds 80.0 %.
Alert recovery
- Template: Alert: ${alarm_name}, OBServer connection percentage: ${value_shown}
- Example: Alert: OceanBase server connection percentage exceeded, OBServer connection percentage: 79 %
Impact on the system
After the number of connections reaches the upper limit, new connections cannot be established.
Possible causes
This is common in scenarios where a single OBServer node has a large number of customer connections.
Solution
Run the following SQL query to find the tenant with an abnormal number of connections.
-- Connect to the sys tenant and view the information.
-- View the number of connections for each tenant's OBServer node, and identify the top 5.
SELECT tenant, svr_ip, COUNT(*) AS session_num FROM __all_virtual_processlist GROUP BY tenant, svr_ip ORDER BY session_num DESC limit 5;
-- Sample return result
-- Note that the sys tenant is an OB cluster system tenant, not a business tenant.
+------------+----------------+-------------+
| tenant | svr_ip | session_num |
+------------+----------------+-------------+
| test1 | xxx.xxx.xxx.1 | 66664 |
| test1 | xxx.xxx.xxx.2 | 66560 |
| test1 | xxx.xxx.xxx.3 | 559 |
| test2 | xxx.xxx.xxx.4 | 78 |
| test2 | xxx.xxx.xxx.5 | 57 |
+------------+----------------+-------------+
5 rows in set (0.02 sec)
Contact the corresponding business party to check if there are unnecessary connections.
If the connection is necessary and the number of connections is about to run out, consider expanding the cluster to resolve the issue.
To expand the cluster, add an OBServer node to the OceanBase cluster. For more information, see Add an OBServer node.
If the connection is not necessary, check the
all_virtual_processlisttable to identify the source of the unnecessary connections.