Alert description
This alert monitors whether the obproxy process exists on the OBProxy host. If the process does not exist, an alert is triggered.
Alert principle
The following table describes the key parameters in the monitoring logic of this alert.
| Parameter | Description |
|---|---|
| Monitoring metric | obproxy_process_exists This metric indicates whether the obproxy process exists. A value of 1 indicates that the process exists, and a value of 0 indicates that it does not. An alert is triggered when the value is 0. |
| Metric source | ps -ef\|grep -w obproxy\|grep -v grep\|wc -l The metric source is special. OCP-Agent executes the Linux command to check whether the obproxy process exists. |
| Metric collection | obproxy_process_exists |
| Monitoring expression | obproxy_process_exists{app="ODP",@LABELS} |
| Collection interval | 1 second |
Rule information
| Monitoring metric | Default threshold | Duration | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| obproxy_process_exists | 0 | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the monitoring metric expression | Downtime | Server |
Alert templates
Overview
- Template: ${alarm_target} ${alarm_name}
- Example: obproxy_cluster_id=3:obproxy_cluster=obproxy_02:svr_ip=xxx.xxx.xxx.xxx OBProxy process does not exist
Alert Details
- Template: OBProxy cluster: ${obproxy_cluster}, host: ${host}, alert: ${alarm_name}.
- Example: OBProxy cluster: obproxy_cluster_id=3, host: xxx.xxx.xxx.xxx, alert: OBProxy process does not exist.
Clear Alert
- Template: Alert: ${alarm_name}, OBProxy process status: ${recover_value}
- Example: Alert: OBProxy process does not exist, OBProxy process status: 1
Here, ${alarm_target} indicates the object that generates the alert. The format is obproxy_cluster_id=xx:obproxy_cluster=xx:svr_ip=xx. obproxy_cluster_id indicates the ID of the OBProxy cluster that generates the alert, obproxy_cluster indicates the name of the OBProxy cluster that generates the alert, and svr_ip indicates the IP address of the OBProxy server that generates the alert.
Impact on the system
OceanBase Database uses OBProxy as its proxy layer to access OBServer nodes. If the OBProxy process does not exist, applications cannot connect to the database.
Possible causes
This issue commonly occurs in the following situations:
Network communication failures.
The OBProxy process unexpectedly stops.
The OBProxy process is alive but unresponsive, and does not report heartbeats.
The failure to report heartbeats is often due to insufficient memory or disk space, network failures, or high OBProxy load.
Solution
Check whether the OBProxy server is faulty.
Check whether the OBProxy server can be started.
If yes, proceed to step 2.
If no, the OBProxy server is faulty and the process does not exist. We recommend that you replace the OBProxy server.
To do this, add a new OBProxy server to the OBProxy cluster and then delete the faulty OBProxy server.
Use the SSH command to log in to the OBProxy server and check whether the login is successful.
If yes, proceed to step 3.
If no, the OBProxy server is busy. We recommend that you restart the OBProxy server.
Choose OCP > OBProxy. In the cluster list, find the cluster to which the faulty OBProxy server belongs, and click the cluster name.
In the OBProxy list, find the faulty OBProxy server and click Restart in the Actions column.
If the OBProxy server cannot be connected after the restart, proceed to step 3.
Check whether the OBProxy server is overloaded or the network connection is unavailable.
Run the following commands to check the process status and resource usage.
# Check whether the process is running. If not, restart the OBProxy server. ps aux | grep obproxy # Check whether the CPU, memory, and other resources are overloaded. If so, the OBProxy server may not be working properly. # Check the CPU and memory usage of the OBServer node. top -n 1 -p $(pgrep obproxy) # Check the remaining space on the disk (data disk and log disk). df | grep /data # Check the number of network connections on the OBServer node. If the number is 0, a network fault may exist. netstat -anp | grep 2883 | wc -lIf the above issues are not resolved, proceed to the next step.
If the issue persists, collect the OBProxy server logs and contact technical support.
Collect the OBProxy server logs.
Generally, the OBProxy server generates ERROR logs, which are displayed as alerts on the OCP console. Check whether an OBProxy server log alert is displayed on the OCP console.
Collect the OS logs. Search for the
errorkeyword in the/var/log/messageslog file and observe the system response.