Description
This alert is triggered when the ratio of current connections of an OBServer to the constant value 256000 exceeds the threshold.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_host_connection_percent |
| Source | SQL: select/*+ READ_CONSISTENCY(WEAK) */ case when cnt is null then 0 else cnt end as cnt, tenant_name from (select __all_tenant.tenant_name, cnt from __all_tenant left join (select count(*) as cnt, tenant as tenant_name from __all_virtual_processlist where svr_ip = @svr_ip and svr_port=rpc_port() group by tenant) t1 on __all_tenant.tenant_name = t1.tenant_name) t2;
|
| Collected metric | active_sessions |
| Metric expression | 100 * max(active_sessions{metric_group="all_virtual_processlist",@LABELS} / 262144) by (@GBLABELS) |
| Collection cycle | 1 second |
The value of the metric ob_host_connection_percent indicates the ratio of connections of the OBServer to the constant value 256000.
When this value exceeds the threshold, this alert is triggered. The default threshold is 80%, which is equivalent to the case where the value of the active_sessions metric is 209715.
Alert rule
| Metric | Default threshold (unit: %) | Detection cycle | Time before clearance |
|---|---|---|---|
| ob_host_connection_percent | 80 | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: Cluster:${ob_cluster_name}, Host:{host}, Alert: The ratio of connections of the OBServer to the constant value 256000 is ${value}%, exceeding the threshold of ${alarm_threshold}%.
Overview example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The ratio of connections of the OBServer to the constant value 256000 exceeds the threshold.
Details example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The ratio of connections of the OBServer to the constant value 256000 is 81.0%, exceeding the threshold of 80.0%.
Impact on the system
New connections cannot be created if the ratio of connections of the OBServer to the constant value 256000 exceeds the threshold.
Possible cause
This problem is commonly found in cases where the ratio of connections of a single OBServer to the constant value 256000 exceeds the threshold.
Suggested solutions
Execute the following SQL statement to locate the tenants with the most connections.
-- Connect to the sys tenant.
-- Query the top 5 tenants with the most OBServer connections.
SELECT tenant, svr_ip, COUNT(*) AS session_num FROM __all_virtual_processlist GROUP BY tenant, svr_ip ORDER BY session_num DESC limit 5;
-- Sample response
-- The sys tenant is a system tenant of the OceanBase cluster instead of a business tenant.
+------------+----------------+-------------+
| tenant | svr_ip | session_num |
+------------+----------------+-------------+
| test1 | xxx.xxx.xxx.xxx | 66664 |
| test1 | xxx.xxx.xxx.xxx | 66560 |
| test1 | xxx.xxx.xxx.xxx | 559 |
| test2 | xxx.xxx.xxx.xxx | 78 |
| test2 | xxx.xxx.xxx.xxx | 57 |
+------------+----------------+-------------+
5 rows in set (0.02 sec)
Contact the customer of the tenant with abnormal connections and ask them to check for unwanted connections.
If all connections are necessary, scale out the cluster to solve the problem of insufficient connections.
Scaling out a cluster means adding OBServers to the OceanBase cluster. For more information, see Add an OBServer.
If some connections are not necessary, query the all_virtual_processlist table to find the sources of unnecessary connections.