mgragent_process_stop

2025-01-10 06:15:54  Updated

Description

This alert is triggered when the ocp_mgragent process stops.

The ocp_agent process monitors the uptime of the processes of components in OceanBase Database. When the uptime of a process is 0, the process is stopped. This alert indicates that the process has stopped at the moment rather than for a long time. In the latter case, other status alerts are triggered.

Relevant alerts:

  • observer process: observer_process_stop
  • obproxy process: obproxy_process_stop
  • obproxyd.sh process: obproxyd_process_stop
  • ocp_agentd process: agentd_process_stop
  • ocp_mgragent process: mgragent_process_stop
  • ocp_monagent process: monagent_process_stop

Principle

Parameter Value
Metric mgragent_uptime_delta_seconds
Source The difference between the current time and the process creation time displayed in the 14th column of the stat file in the /proc/pid directory.
Collected metric process_uptime_seconds
Metric expression 0 - min(delta(process_uptime_seconds{name="ocp_mgragent",@LABELS}[@INTERVAL])) by (@GBLABELS)
Collection cycle 5 seconds

Alert rule

Metric expression Metric description Default threshold Detection cycle Time before clearance
mgragent_uptime_delta_seconds > 0 The negative offset value of uptime is used. A metric value greater than 0 indicates that the process has stopped. 0 seconds 10 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Warning Host

Alert templates

  • Overview

    • Template: ${alarm_target} ${alarm_name}
    • Example: alarm_template_id=0:ob_cluster=AdminMETA-12:host=xxx.xxx.xxx.xxx. The ocp_mgragent process stopped.
  • Details

    • Template: Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name}. The process has been running for ${value_shown}.
    • Example: Cluster: AdminMETA. Host: xxx.xxx.xxx.xxx. Alert: The ocp_mgragent process stopped. The process has been running for 34 minutes 8 seconds.

Impact on the system

OCP-Agent uses the ocp_mgragent process for the O&M of OBServer nodes and OBProxies. The stop of the ocp_mgragent process brings the following results:

  1. The ongoing O&M operations are terminated. The operations may be associated with an O&M task on the OceanBase Cloud Platform (OCP) server and the O&M task will fail. You can retry the task.

  2. If the ocp_mgragent process stops and is not started by the ocp_agentd process, new O&M tasks cannot be executed. In this case, a host_unavailable alert is reported.

Possible causes

  • A process bug causes an execution error and the process unexpectedly exits.
  • A monitoring configuration error, for example, a YAML syntax error of a custom configuration in the module_config file stored in the /home/admin/ocp_agent/conf/ directory.
  • An expected O&M action. In this case, you can restart the OCP-Agent in the OceanBase Cloud Platform (OCP) console.

Solutions

  1. If the process exits due to a bug, you can search the ocp_mgragent.error.log file for the panic keyword to check the bug. In this case, if the daemon process fails to restart the ocp_mgragent process, you can manually restart it.

    /home/admin/ocp_agentctl service start ocp_mgragent
    
  2. If the process fails to restart due to a configuration error, you can correct the custom configuration and then start the process.

Contact Us