Description
Monitors whether the OBProxy process has stopped.
OCP-Agent monitors the start time of OceanBase components by using the OCP-Agent process. If the start time changes, it indicates that the process has stopped or restarted. This alert only indicates that the process has stopped, not that it has been running for a long time. If a process stops for a long time, other status alerts will be triggered.
The relevant processes are as follows:
- observer: observer_process_stop
- obproxy: obproxy_process_stop
- obproxyd.sh: obproxyd_process_stop
- ocp_agentd: agentd_process_stop
- ocp_mgragent: mgragent_process_stop
- ocp_monagent: monagent_process_stop
Principle
| Parameter | Value |
|---|---|
| Monitoring metric | obproxy_boot_time_delta_seconds |
| Metric source | The start time of a process is the boot time of the system plus the time since the process was started. The boot time of the system is the value of btime in the result of the cat /proc/stat command. The time since the process was started is the value of the 22nd column in the result of the cat /proc/pid/stat command divided by 100. |
| Metric collection | process_boot_time_seconds |
| Monitoring expression | max(delta(process_boot_time_seconds{name="obproxy",@LABELS}[@INTERVAL])) by (@GBLABELS) |
| Metric collection cycle | 5 seconds |
Rules
| Monitoring expression | Description | Default threshold | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| obproxy_boot_time_delta_seconds > 0 | If the monitoring metric exceeds 0, the process has stopped. | 0 seconds | 30 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Monitoring expression | Down | OBProxy |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:obproxy_cluster=ODPT2-2000005:host=xxx.xxx.xxx.xxx OBProxy process stopped
Alert details
- Template: Cluster: BProxy cluster: ${obproxy_cluster}, Host: ${host}, Alert: ${alarm_name}.
- Example: OBProxy cluster: ODPT2, Host: xxx.xxx.xxx.xxx, Alert: OBProxy process stopped.
Alert recovery
- Template: Alert: ${alarm_name}, OBProxy process start time change: ${recover_value}
- Example: Alert: OBProxy process stopped, OBProxy process start time change: 0 seconds
Impact on the system
After the OBProxy process stops, it is restarted by the guardian process (obproxyd.sh). However, this will affect the business. The business connections will be disconnected, and the business will be damaged, such as a drop in QPS.
Possible causes
- The memory usage of the OBProxy process exceeds the limit.
- Other unexpected situations.
Procedure
After the OBProxy process is stopped, you can try to start the process. Generally, it is started by the guardian process. If the guardian process is unexpectedly stopped, you need to manually start the guardian process:
cd /opt/taobao/install/obproxy-3.4.0 && bin/obproxyd.sh -c start -r /home/admin/logs/obproxy -n ODPT2 -p 2883If the guardian process exists but the OBProxy process does not, the OBProxy process may still not be able to be started manually. In this case, you need to troubleshoot the following issues:
Whether the configurl-server service (which can be provided by OCP or another service) is working properly. If the OBProxy startup mode is configurl, make sure that the configurl-server is working properly.
Retrieve the error logs of the OBProxy process and the coredump file, and contact OceanBase Technical Support.