Description
This alert is triggered when the ocp_mgragent process stops.
The ocp_agent process monitors the uptime of the processes of components in OceanBase Database. When the uptime of a process is 0, the process is stopped. This alert indicates that the process has stopped at the moment rather than for a long time. In the latter case, other status alerts are triggered.
Relevant alerts:
- observer process: observer_process_stop
- obproxy process: obproxy_process_stop
- obproxyd.sh process: obproxyd_process_stop
- ocp_agentd process: agentd_process_stop
- ocp_mgragent process: mgragent_process_stop
- ocp_monagent process: monagent_process_stop
Principle
| Parameter | Value |
|---|---|
| Metric | mgragent_uptime_delta_seconds |
| Source | The difference between the current time and the process creation time displayed in the 14th column of the stat file in the /proc/pid directory. |
| Collected metric | process_uptime_seconds |
| Metric expression | 0 - min(delta(process_uptime_seconds{name="ocp_mgragent",@LABELS}[@INTERVAL])) by (@GBLABELS) |
| Collection cycle | 5 seconds |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Time before clearance |
|---|---|---|---|---|
| mgragent_uptime_delta_seconds > 0 | The negative offset value of uptime is used. A metric value greater than 0 indicates that the process has stopped. | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Warning | Host |
Alert templates
Overview
- Template:
${alarm_target} ${alarm_name} - Example:
alarm_template_id=0:ob_cluster=AdminMETA-12:host=xxx.xxx.xxx.xxx. The ocp_mgragent process stopped.
- Template:
Details
- Template:
Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name}. The process has been running for ${value_shown}. - Example:
Cluster: AdminMETA. Host: xxx.xxx.xxx.xxx. Alert: The ocp_mgragent process stopped. The process has been running for 34 minutes 8 seconds.
- Template:
Impact on the system
OCP-Agent uses the ocp_mgragent process for the O&M of OBServer nodes and OBProxies. The stop of the ocp_mgragent process brings the following results:
The ongoing O&M operations are terminated. The operations may be associated with an O&M task on the OceanBase Cloud Platform (OCP) server and the O&M task will fail. You can retry the task.
If the ocp_mgragent process stops and is not started by the ocp_agentd process, new O&M tasks cannot be executed. In this case, a host_unavailable alert is reported.
Possible causes
- A process bug causes an execution error and the process unexpectedly exits.
- A monitoring configuration error, for example, a YAML syntax error of a custom configuration in the
module_configfile stored in the/home/admin/ocp_agent/conf/directory. - An expected O&M action. In this case, you can restart the OCP-Agent in the OceanBase Cloud Platform (OCP) console.
Solutions
If the process exits due to a bug, you can search the
ocp_mgragent.error.logfile for thepanickeyword to check the bug. In this case, if the daemon process fails to restart the ocp_mgragent process, you can manually restart it./home/admin/ocp_agentctl service start ocp_mgragentIf the process fails to restart due to a configuration error, you can correct the custom configuration and then start the process.