Description
This alert is triggered when OCP-Agent detects that the obproxyd.sh process, the daemon of OBProxy, does not exist. The value 1 indicates that the obproxyd.sh process exists. The value 0 indicates that the obproxyd.sh process does not exist, in which case this alert is triggered.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | obproxyd_process_exists
Note |
| Source | ps -ef|grep -w obproxyd.sh|grep -v grep|wc -l
Note |
| Collected metric | obproxyd_process_exists |
| Metric expression | obproxyd_process_exists{app="ODP",@LABELS} |
| Collection cycle | 1 second |
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| obproxyd_process_exists | 0 (unconnectable) | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Stopped | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: OBProxy Cluster: ${obproxy_cluster}, Host: ${host}, Alert: ${alarm_name}
Overview example: obproxy_cluster_id=3:obproxy_cluster=obproxy_02:svr_ip=xxx.xxx.xxx.xxx. The obproxyd.sh process does not exist.
Details example: OBProxy Cluster: obproxy_02, Host: xxx.xxx.xxx.xxx, Alert: The obproxyd.sh process does not exist.
${alarm_target} indicates the object that generated the alert, in the obproxy_cluster_id=xx:obproxy_cluster=xx:svr_ip=xx format. obproxy_cluster_id indicates the ID of the OBProxy cluster that generated the alert. obproxy_cluster indicates the name of the OBProxy cluster that generated the alert. svr_ip indicates the IP address of the OBProxy server that generated the alert.
Impact on the system
The obproxyd.sh process monitors the status of the obproxy process. When OBProxy quits unexpectedly, the obproxyd.sh process will try to restart the obproxy process. If both the obproxyd.sh process and the obproxy process are abnormal, the OBProxy may not be restarted after it is stopped.
Possible causes
This problem is commonly found in the following scenarios:
A network communication error occurs.
The obproxyd.sh process unexpectedly stops.
The obproxyd.sh process is alive but it does not report its heartbeat.
Suggested solutions
Check whether the OBProxy server is functioning.
Try to start the OBProxy server.
If the server can be started, go to Step 2.
If the OBProxy server cannot be started, the issue is caused by a failure of the OBProxy server. We recommend that you replace this OBProxy.
To replace a faulty OBProxy, you need to first add a new OBProxy into the OBProxy cluster and then delete the faulty OBProxy.
Run the ssh command to log on to the OBProxy server.
If you can log on, the problem may have been caused by an unknown issue. Go to Step 3.
If you cannot log on and the system prompts that the server is busy, we recommend that you restart the OBProxy.
Choose OCP > OBProxy . On the page that appears, find the cluster where the faulty OBProxy server is located in the OBProxy clusters list, and then click the name of the cluster.
Find the faulty OBProxy server in the OBProxy servers list, and then click Restart in the Actions column.
If you still cannot connect to it after the restart, go to Step 3.
Check whether the OBProxy is overloaded or the network communication fails, making it unable to report its heartbeats.
Run the following commands to check the process status and resource usage.
# Check whether the obproxy process is alive. If not, restart the obproxy process. ps aux | grep obproxyd # If the CPU utilization and memory usage are too high, the obproxy process may not be able to provide services. # Check the CPU utilization and memory usage of the OBServer. top -n 1 -p $(pgrep obproxyd) # Check the available space of the data disk and log disk. df | grep /data # Check the network connections of the OBServer. If the number of network connections is 0, a network failure may have occurred. netstat -anp | grep 2883 | wc -lIf no issue is found in this step, go to the next step.
Collect the log information and contact OCP technical support for help.
Collect OBProxy logs.
Errors recorded in error logs of OBProxy usually trigger alerts in the OCP console. You can go to the Alert Events page of the OCP console to check for OBProxy log alerts.
Collect OS logs. Search for logs in the
/var/log/messageslog file based on the keyworderrorand check the returned information of the system.