agentd_process_stop

2024-07-30 09:31:29  Updated

Description

This alert is triggered when the ocp_agentd process stops.

The ocp_agent process monitors the uptime of the processes of components in OceanBase Database. When the uptime of a process is 0, the process is stopped. This alert indicates that the process has stopped at the moment rather than for a long time. In the latter case, other status alerts are triggered.

Relevant alerts:

  • observer process: observer_process_stop
  • obproxy process: obproxy_process_stop
  • obproxyd.sh process: obproxyd_process_stop
  • ocp_agentd process: agentd_process_stop
  • ocp_mgragent process: mgragent_process_stop
  • ocp_monagent process: monagent_process_stop

Principle

Parameter Value
Metric agentd_uptime_delta_seconds
Source The difference between the current time and the process creation time displayed in the 14th column of the stat file in the /proc/[pid] directory.
Collected metric process_uptime_seconds
Metric expression 0 - min(delta(process_uptime_seconds{name=“ocp_agentd”,@LABELS}[@INTERVAL])) by (@GBLABELS)
Collection cycle 5 seconds

Alert rule

Metric expression Metric description Default threshold Detection cycle Time before clearance
agentd_uptime_delta_seconds > 0 The negative offset value of uptime is used. A metric value greater than 0 indicates that the process has stopped. 0 seconds 10 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Warning Host

Alert templates

  • Overview

    • Template: ${alarm_target} ${alarm_name}
    • Example: alarm_template_id=0:ob_cluster=AdminMETA-12:host=xxx.xxx.xxx.xxx. The ocp_agentd process stopped.
  • Details

    • Template: Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name}. The process has been running for ${value_shown}.
    • Example: Cluster: AdminMETA. Host: xxx.xxx.xxx.xxx. Alert: The ocp_agentd process stopped. The process has been running for 34 minutes 8 seconds.

Impact on the system

The ocp_agentd process is the daemon of the ocp_mgragent and ocp_monagent processes. If the ocp_agentd process is stopped, the ocp_mgragent or ocp_monagent process cannot be restarted after an unexpected exit. For more information about the scope of impact, see obagent_dead.

Possible causes

  • A process bug causes an execution error and the process unexpectedly exits.
  • An expected O&M action. In this case, you can restart the OCP-Agent in the OceanBase Cloud Platform (OCP) console.

Solutions

If the process exits due to a bug, you can search the ocp_agentd.error.log file for the panic keyword to check the bug. In this case, if the daemon process fails to restart the ocp_agentd process, you can manually restart it.

/home/admin/ocp_agentctl restart

Contact Us