Alert description
Note
This alert takes effect only for OceanBase clusters of version V4.2.1.0 or later.This alert monitors whether the number of sessions in a tenant of OceanBase Database remains in the killed state for an extended period.
Alert principle
The following table lists the key parameters involved in the monitoring logic of this alert.
Parameter |
Value |
|---|---|
| Monitoring Metrics | ob_tenant_session_kill_delay: The delay before a tenant session is killed. An alert is triggered when this duration exceeds the threshold. |
| Monitoring Expression | max(session_kill_delay_second) |
| Metric Collection | session_kill_delay_second |
| Data Source | SQL collection:
|
| Collection Cycle | 10 Minutes |
Rule information
Monitoring Metrics |
Default Threshold (Unit: Seconds) |
Duration |
Detection Cycle |
Elimination Cycle |
|---|---|---|---|---|
| session_kill_delay_second | 600 | 0 Seconds | 60 Seconds | 6 Minutes |
Alert information
Alert Trigger Method |
Alert Level |
Scope |
|---|---|---|
| Based on monitoring metric expressions | Warning | Tenant |
Alert template
Alert overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster_name=obcluster:ob_cluster_id=4:tenant_name=mysql:svr_ip=xx.xx.xx.xx:svr_port=10106:session_id=2147483648 OceanBase Tenant Session Killed Delay
Alert Details
- Template: SessionId ${session_id} on port ${svr_port} of host ${svr_ip} in tenant ${tenant_name} of OceanBase cluster ${ob_cluster_name} has not terminated normally. It has exceeded ${value_shown} seconds, which is above the threshold of ${alarm_threshold} seconds.
- Example: In the OceanBase cluster obcluster, the session with ID 2147483648 on host xx.xx.xx.xx, port 10106, of tenant mysql has not terminated normally. It has exceeded 867 seconds, which is beyond the threshold of 600 seconds.
Alert Recovery
- Template: Alert: ${alarm_name}, Tenant Session Killed Delay: ${value_shown}
- Example: Alert: Tenant Session Killed Delay, Tenant Session Killed Delay: 10
Impact on the system
If a session cannot be exited for an extended period, it will continuously occupy thread resources. If multiple sessions remain unexitable, it may affect the processing of normal requests.
Possible causes
- The corresponding session is executing an uninterruptible database task and can only exit after the task reaches the session status checkpoint.
- An exception occurred during database session execution, preventing the session from exiting.
Solution
Confirm the timeout relationship. Check whether the duration during which the session cannot be killed has exceeded the SQL statement timeout period (ob_query_timeout) of the corresponding tenant. If not, you can refrain from intervention for now and wait for the statement to time out naturally. Then, observe whether the alert recovers automatically.
Assess the resource impact and make a decision. If the session blocking time has exceeded the SQL timeout threshold, the alert persists, and there are a large number of sessions that cannot be killed, these sessions may continue to occupy thread resources, thereby affecting the processing of normal requests. In this case, it is recommended to assess the business impact, then restart the corresponding OBServer node to release the abnormal sessions and related resources, and collect the information collection data corresponding to the alert. Contact technical support for assistance.
