Description
This alert is triggered when the ocp_agentd process stops.
The ocp_agent process monitors the uptime of the processes of components in OceanBase Database. When the uptime of a process is 0, the process is stopped. This alert indicates that the process has stopped at the moment rather than for a long time. In the latter case, other status alerts are triggered.
Relevant alerts:
- observer process: observer_process_stop
- obproxy process: obproxy_process_stop
- obproxyd.sh process: obproxyd_process_stop
- ocp_agentd process: agentd_process_stop
- ocp_mgragent process: mgragent_process_stop
- ocp_monagent process: monagent_process_stop
Principle
| Parameter | Value |
|---|---|
| Metric | agentd_uptime_delta_seconds |
| Source | The difference between the current time and the process creation time displayed in the 14th column of the stat file in the /proc/[pid] directory. |
| Collected metric | process_uptime_seconds |
| Metric expression | 0 - min(delta(process_uptime_seconds{name="ocp_agentd",@LABELS}[@INTERVAL])) by (@GBLABELS) |
| Collection cycle | 5 seconds |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Time before clearance |
|---|---|---|---|---|
| agentd_uptime_delta_seconds > 0 | The negative offset value of uptime is used. A metric value greater than 0 indicates that the process has stopped. | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Warning | Host |
Alert templates
Overview
- Template:
${alarm_target} ${alarm_name} - Example:
alarm_template_id=0:ob_cluster=AdminMETA-12:host=xxx.xxx.xxx.xxx. The ocp_agentd process stopped.
- Template:
Details
- Template:
Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name}. The process has been running for ${value_shown}. - Example:
Cluster: AdminMETA. Host: xxx.xxx.xxx.xxx. Alert: The ocp_agentd process stopped. The process has been running for 34 minutes 8 seconds.
- Template:
Impact on the system
The ocp_agentd process is the daemon of the ocp_mgragent and ocp_monagent processes. If the ocp_agentd process is stopped, the ocp_mgragent or ocp_monagent process cannot be restarted after an unexpected exit. For more information about the scope of impact, see obagent_dead.
Possible causes
- A process bug causes an execution error and the process unexpectedly exits.
- An expected O&M action. In this case, you can restart the OCP-Agent in the OceanBase Cloud Platform (OCP) console.
Solutions
If the process exits due to a bug, you can search the ocp_agentd.error.log file for the panic keyword to check the bug. In this case, if the daemon process fails to restart the ocp_agentd process, you can manually restart it.
/home/admin/ocp_agentctl restart