mgragent_process_stop ocp_mgragent process stop|V4.3.6| docs|Distributed Database

mgragent_process_stop ocp_mgragent process stop

Last Updated：2025-09-08 08:15:43 Updated

Alert description

Monitors whether the ocp_mgragent process has stopped.

The OCP-Agent process monitors the start time of each OceanBase component process. If the start time changes, it indicates that the process has stopped or restarted. This alert only indicates that the process has stopped and does not mean that the process is running for a long time. If the process is stopped for a long time, another status-related alert will be triggered.

The relevant processes include:

observer: alert item is observer_process_stop
obproxy: alert item is obproxy_process_stop
obproxyd.sh: alert item is obproxyd_process_stop
ocp_agentd: alert item is agentd_process_stop
ocp_mgragent: alert item is mgragent_process_stop
ocp_monagent: alert item is monagent_process_stop

Alert principle

Parameter	Value
Monitoring metric	mgragent_boot_time_delta_seconds
Source of the metric	The boot time of the system plus the time difference between the process and the time the system was restarted is the start time of the process. The boot time of the system is equal to the btime value returned by the command cat /proc/stat. The time difference between the process and the system restart is equal to the value in the 22nd column of the result of the command cat /proc/pid/stat, divided by 100.
Sampling metric	process_boot_time_seconds
Monitoring expression	max(delta(process_boot_time_seconds{name="ocp_mgragent",@LABELS}[@INTERVAL])) by (@GBLABELS)
Sampling interval	5 seconds

Rule information

Monitoring expression	Description of the monitoring metric	Default threshold	Detection cycle	Elimination cycle
mgragent_boot_time_delta_seconds > 0	If the monitoring metric is greater than 0, it indicates that the process has stopped.	0 seconds	10 seconds	5 minutes

Alert information

Alert triggering method	Alert level	Scope
Monitoring expression	Warning	Host

Alert template

Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster=AdminMETA-12:host=xxx.xxx.xxx.xxx ocp_mgragent process stopped
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}.
- Example: Cluster: AdminMETA, Host: xxx.xxx.xxx.xxx, Alert: ocp_mgragent process stopped.
Alert recovery
- Template: Alert: ${alarm_name}, ocp_mgragent process start time change: ${recover_value}
- Example: Alert: ocp_mgragent process stopped, ocp_mgragent process start time change: 0 seconds

Impact on the system

OCP-Agent is an OCP-Agent process that manages the OBServer and OBProxy. If the ocp_mgragent process stops, the following issues may occur:

Ongoing maintenance operations will be terminated. These operations may be represented as maintenance tasks in the OCP-Server. The maintenance tasks will ultimately fail and can be retried.
If the process is not restarted by the guardian process (ocp_agentd), new maintenance tasks cannot be executed. In this case, the host_unavailable alert will be triggered.

Possible causes

Process bugs causing execution errors and unexpected exits.
Monitoring configuration errors, such as syntax (YAML) errors in custom configurations in /home/admin/ocp_agent/conf/module_config.
Scheduled maintenance, such as GUI-based agent restarts.

Solution

The process BUG will record logs in ocp_mgragent.error.log, and you can search for the panic keyword to confirm. At this point, you can try to start the process (the daemon process will attempt to pull it up, and if it fails, you need to manually pull it up).
```
/home/admin/ocp_agentctl service start ocp_mgragent
```
If the process restart fails due to configuration errors, you can try to correct the custom configuration and then start the process.