Description
Monitors whether the OBProxy daemon process has stopped.
OCP-Agent monitors the start time of OceanBase components by tracking the start time of OceanBase components. If the start time changes, it indicates that the process has stopped or restarted. This alert only indicates that the process has stopped, not that it has been running for a long time. If the process is stopped for a long time, other status alerts will be triggered.
The relevant processes include:
- observer: alert item is observer_process_stop
- obproxy: alert item is obproxy_process_stop
- obproxyd.sh: alert item is obproxyd_process_stop
- ocp_agentd: alert item is agentd_process_stop
- ocp_mgragent: alert item is mgragent_process_stop
- ocp_monagent: alert item is monagent_process_stop
Principle
| Parameter | Value |
|---|---|
| Monitoring metric | obproxyd_boot_time_delta_seconds |
| Metric source | The start time of a process is the system boot time plus the time since the process was started. The system boot time is the btime value returned by the command cat /proc/stat. The time since the process was started is the value in the 22nd column of the output of the command cat /proc/pid/stat divided by 100. |
| Metric collection | process_boot_time_seconds |
| Monitoring expression | max(delta(process_boot_time_seconds{name="obproxyd.sh",@LABELS}[@INTERVAL])) by (@GBLABELS) |
| Metric collection interval | 5 seconds |
Rule information
| Monitoring expression | Description of the monitoring metric | Default threshold | Detection interval | Elimination interval |
|---|---|---|---|---|
| obproxyd_boot_time_delta_seconds > 0 | When the monitoring metric exceeds 0, it indicates that the process has stopped. | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Monitoring expression | Warning | OBProxy |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:obproxy_cluster=ODPT2-2000005:host=xxx.xxx.xxx.xxx OBProxy daemon process stop
Alert details
- Template: OBProxy cluster: ${obproxy_cluster}, host: ${host}, alert: ${alarm_name}.
- Example: OBProxy cluster: ODPT2, host: xxx.xxx.xxx.xxx, alert: OBProxy daemon process stop.
Alert recovery
- Template: Alert: ${alarm_name}, OBProxy daemon process start time change: ${recover_value}
- Example: Alert: OBProxy daemon process stop, OBProxy daemon process start time change: 0 seconds
Impact on the system
After the OBProxy process is stopped, the OBProxy daemon (obproxyd.sh) will restart it. If the daemon cannot be started, the OBProxy process will not be restarted after an unexpected exit, and the business impact time will be extended.
It is recommended to keep the OBProxy process running and provide services to minimize business impact time.
Possible causes
No common causes
Resolution
Manually start the OBProxy daemon process. Restarting OBProxy can cause business impact, so it is recommended to avoid restarting OBProxy in OCP.
cd /opt/taobao/install/obproxy-3.4.0 && bin/obproxyd.sh -c start -r /home/admin/logs/obproxy -n ODPT2 -p 2883