Description
This alert is triggered when the obproxy process of the OBProxy server does not exist.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | obproxy_process_exists Note The value of the metric indicates the availability of the obproxy process. Valid values: 1 and 0. The value 1 indicates that the process exists. The value 0 indicates that the process does not exist and triggers this alert. |
| Source | javascript ps -ef|grep -w obproxy|grep -v grep|wc -l Note The metric source of this alert is special. OCP-Agent runs the preceding Linux command to check whether the obproxy process exists. |
| Collected metric | obproxy_process_exists |
| Metric expression | obproxy_process_exists{app="ODP",@LABELS} |
| Collection cycle | 1 second |
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| obproxy_process_exists | 0 | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Stopped | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}
Overview example: obproxy_cluster_id=3:obproxy_cluster=obproxy_02:svr_ip=192.168.1.1. The obproxy process does not exist.
Details example: obproxy_cluster_id=3:obproxy_cluster=obproxy_02:svr_ip=192.168.1.1. The obproxy process does not exist.
${alarm_target} indicates the object that generated the alert, in the obproxy_cluster_id=xx:obproxy_cluster=xx:svr_ip=xx format. obproxy_cluster_id indicates the ID of the OBProxy cluster that generated the alert. obproxy_cluster indicates the name of the OBProxy cluster that generated the alert. svr_ip indicates the IP address of the OBProxy server that generated the alert.
Impact on the system
OBProxy provides access to the OBServer cluster as the proxy. When the obproxy process does not exist, the application cannot connect to the database.
Possible causes
This problem is commonly found in the following scenarios:
A network communication error occurs.
The obproxy process unexpectedly stops.
The obproxy process is alive but is unresponsive, without reporting its heartbeats.
The heartbeat reporting failure is often caused by insufficient memory or disk space, network failure, or overloaded OBProxy.
Suggested solutions
Check whether the OBProxy server is functioning.
Try to start the OBProxy server.
If the server can be started, go to Step 2.
If the OBProxy server cannot be started, the issue is caused by a failure of the OBProxy server. We recommend that you replace this OBProxy.
To replace a faulty OBProxy, you need to first add a new OBProxy into the OBProxy cluster and then delete the faulty OBProxy.
Run the ssh command to log on to the OBProxy server.
If you can log on, the problem may have been caused by an unknown issue. Go to Step 3.
If you cannot log on and the system prompts that the server is busy, we recommend that you restart the OBProxy.
Choose OCP > OBProxy . On the page that appears, find the cluster where the faulty OBProxy server is located in the OBProxy clusters list, and then click the name of the cluster.
Find the faulty OBProxy server in the OBProxy servers list, and then click Restart in the Actions column.
If you still cannot connect to it after the restart, go to Step 3.
Check whether the OBProxy is overloaded or the network is disconnected.
Run the following commands to check the process status and resource usage.
# Check whether the obproxy process is alive. If not, restart the obproxy process. ps aux | grep obproxy # If the CPU utilization and memory usage are too high, the obproxy process may not be able to provide services. # Check the CPU utilization and memory usage of the OBServer. top -n 1 -p $(pgrep obproxy) # Check the available space of the data disk and log disk. df | grep /data # Check the network connections of the OBServer. If the number of network connections is 0, a network failure may have occurred. netstat -anp | grep 2883 | wc -lIf no issue is found in this step, go to the next step.
Collect the log information and contact OCP technical support for help.
Collect OBProxy logs.
Errors recorded in error logs of OBProxy usually trigger alerts in the OCP console. You can go to the Alert Events page of the OCP console to check for OBProxy log alerts.
Collect OS logs. Search for logs in the
/var/log/messageslog file based on the keyworderrorand check the returned information of the system.