Alert description
This alert is triggered when the resident memory of the Agent exceeds the threshold.
OCP provides two types of Agents: the monitoring Agent (ocp_monagent) and the O&M Agent (ocp_mgragent). These are important tools for managing and monitoring OceanBase Database.
Alert principle
| Parameter | Value |
|---|---|
| Monitoring metric | host_agent_res_memory indicates the resident memory of the Agent process, in GB. If the memory of the Agent process exceeds 2 GB (the default value), the process is restarted by the guardian process. |
| Source of the metric | The metric is collected from process self-monitoring. The source URL is as follows:http://localhost:62888/metrics/stathttp://localhost:62889/metrics/stat |
| Metric to be collected | process_resident_memory_bytes |
| Monitoring expression | max(process_resident_memory_bytes{@LABELS}) by (@GBLABELS) / 1073741824 |
| Collection interval | 1 minute |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Expression based on monitoring metrics | Warning | Server |
Rule information
| Monitoring metric | Default threshold | Source of the monitoring metric | Inspection cycle | Elimination cycle |
|---|---|---|---|---|
| host_agent_res_memory | 1.5 | Process self-monitoring | 60 seconds | 5 minutes |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx:process=ocp_monagent Server Agent memory exceeds the limit
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: Agent process: ${process}, Resident memory ${value_shown} exceeds the limit of ${alarm_threshold} GiB.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Agent process: ocp_monagent, Resident memory 1.6 GiB exceeds the limit of 1.5 GiB.
Alert recovery
- Template: Alert: ${alarm_name}, Server Agent resident memory size: ${value_shown}
- Example: Alert: Server Agent memory exceeds the limit, Server Agent resident memory size: 1.0 GiB
Impact on the system
The Agent process is an important tool for O&M and monitoring OceanBase Database. Its stability is crucial. When the Agent process consumes excessive system resources, it can affect the operation of OceanBase Database.
For example, if the Agent process consumes excessive memory, it can lead to insufficient memory in OceanBase Database.
Possible causes
The expected memory usage for the Agent process should be within 1 GB. If excessive memory usage is observed, possible causes include:
Resource leaks in the program, such as memory leaks, file handle leaks, or goroutine leaks.
Unreasonable resource usage by the process.
The monitoring Agent process executes SQL queries to collect data. If OceanBase Database unexpectedly stops, it can lead to resource accumulation and delayed memory recycling.
Procedure
When an alert is triggered, check the alert details to confirm the memory usage of the Agent.
If the memory usage exceeds 10 GB, immediately restart the Agent process to prevent impact on the normal operation of the OceanBase Database components.
If the memory usage is within an acceptable range (such as 2 GB or less), it will not affect the operation of OceanBase Database. Perform the following steps:
Save the environment context information and immediately restart the Agent.
Provide the environment context information to the O&M engineer. The information includes:
The memory usage of the current process and its parent and child processes (ocp_agentd is the parent process of the current process).
The memory performance analysis file of the current process.
PID=$(cat /home/admin/ocp_agent/run/ocp_monagent.pid) SOCKET=$PID # Coroutine performance data curl --unix-socket /home/admin/ocp_agent/run/ocp_monagent.$(cat /home/admin/ocp_agent/run/ocp_monagent.pid).sock http://11/debug/pprof/goroutine?debug=1 --output /tmp/goroutine.txt # CPU performance sampling data curl --unix-socket /home/admin/ocp_agent/run/ocp_monagent.$(cat /home/admin/ocp_agent/run/ocp_monagent.pid).sock http://localhost/debug/pprof/profile?seconds=30 --output pprof.profile.gz # Memory sampling data curl --unix-socket /home/admin/ocp_agent/run/ocp_monagent.$(cat /home/admin/ocp_agent/run/ocp_monagent.pid).sock http://localhost/debug/pprof/heap --output pprof.heap.gz