Description
This alert is triggered when the resident memory of an Agent process exceeds the threshold.
The Agent service consists of the monitoring Agent process ocp_monagent and the O&M Agent process ocp_mgragent. It is an important tool for managing and monitoring OceanBase databases in OceanBase Cloud Platform (OCP).
Principle
| Parameter | Value |
|---|---|
| Metric | host_agent_res_memory The resident memory of an Agent process. Unit: GB. If the resident memory of the Agent process exceeds the threshold, which is 2 GB by default, the process will be restarted by the daemon process. |
| Source | Depends on the process-exporter provided by Prometheus. Metrics are collected from the process-exporter. * The source for the O&M Agent process: http://localhost:62888/metrics/stat * The source for the monitoring Agent process: http://localhost:62889/metrics/stat |
| Collected metric | process_resident_memory_bytes |
| Metric expression | max(process_resident_memory_bytes{@LABELS}) by (@GBLABELS) / 1073741824 |
| Collection cycle | 1 minute |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Warning | Server |
Alert rule
| Metric | Default threshold | Source | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| host_agent_res_memory | 1.5 | process-exporter | 60 seconds | 5 minutes |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details:
Server: ${svr_ip}, Agent process: ${process}. The resident memory is ${value} GB, exceeding the threshold of ${alarm_threshold} GB.
Overview example: svr_ip=192.168.1.1:process=ocp_monagent Oversized memory of Agent on the server
Details example:
Server: 192.168.0.1, Agent process: ocp_monagent. The resident memory is 1.6 GB, exceeding the threshold of 1.5 GB.
Impact on the system
Agent processes are important tools for OceanBase database O&M and monitoring in OCP. It is crucial to keep Agent processes stable. If Agent processes use excessive system resources, OceanBase database running will be affected.
For example, if Agent processes use excessive memory resources, the memory available for OceanBase databases will be insufficient.
Possible causes
An Agent process is expected to use no more than 1 GB of memory. If excessive memory resources are used, the following may be possible causes:
The Agent process encounters resource leakage, such as memory leakage, file handle leakage, and goroutine leakage.
The Agent process uses resources improperly.
The monitoring Agent process will execute SQL statements to collect data. If the OceanBase database stops working abnormally, the memory resources of the process cannot be reclaimed in time because the collected data piles up.
Suggested solutions
When the alert is triggered, view the alert details to check the memory used by the Agent process.
If more than 10 GB of memory is used, restart the Agent process immediately to avoid impact on OceanBase database running.
If the Agent process uses an acceptable amount of memory, such as memory within 2 GB, which does not affect OceanBase database running, you can perform the following operations:
Save the environment context and then restart the Agent process immediately.
Send the following context information to the O&M engineer:
The memory used by the Agent process and its parent process ocp_agentd.
The memory performance analysis file for the Agent process.
PID=$(cat /home/admin/ocp_agent/run/ocp_monagent.pid) SOCKET=$PID # Goroutine performance data curl --unix-socket /home/admin/ocp_agent/run/ocp_monagent.$PID.sock http://11/debug/pprof/goroutine?debug=1 --output /tmp/goroutine.txt # CPU performance sampling data curl --unix-socket /home/admin/ocp_agent/run/ocp_monagent.$PID.sock http://localhost/debug/pprof/profile?seconds=30 --output pprof.profile.gz # Memory sampling data curl --unix-socket /home/admin/ocp_agent/run/ocp_monagent.$PID.sock http://localhost/debug/pprof/heap --output pprof.heap.gz