obproxyd_process_dead|V4.3.1| docs|Distributed Database

obproxyd_process_dead

Last Updated：2024-08-23 10:14:19 Updated

Description

This alert is triggered when OCP-Agent detects that the obproxyd.sh process, the daemon of OBProxy, does not exist. The value 1 indicates that the obproxyd.sh process exists. The value 0 indicates that the obproxyd.sh process does not exist, in which case this alert is triggered.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter	Value
Metric	obproxyd_process_exists Note The value of the metric indicates whether the obproxyd.sh process exists. The value 1 indicates that the obproxyd.sh process exists. The value 0 indicates that the obproxyd.sh process does not exist, in which case, this alert is triggered.
Source	`ps -ef\|grep -w obproxyd.sh\|grep -v grep\|wc -l` Note The metric source of this alert is special. OCP-Agent runs the preceding Linux command to verify whether the daemon of OBProxy exists.
Collected metric	obproxyd_process_exists
Metric expression	obproxyd_process_exists{app="ODP",@LABELS}
Collection cycle	1 second

Alert rule

Metric	Default threshold	Duration	Detection cycle	Time before clearance
obproxyd_process_exists	0 (unconnectable)	0 seconds	10 seconds	5 minutes

Alert information

Trigger method	Alert level	Scope
Metric expression	Stopped	Server

Alert templates

Overview: ${alarm_target} ${alarm_name}
Details: OBProxy Cluster: ${obproxy_cluster}, Host: ${host}, Alert: ${alarm_name}
Overview example: obproxy_cluster_id=3:obproxy_cluster=obproxy_02:svr_ip=xxx.xxx.xxx.xxx. The obproxyd.sh process does not exist.
Details example: OBProxy Cluster: obproxy_02, Host: xxx.xxx.xxx.xxx, Alert: The obproxyd.sh process does not exist.

${alarm_target} indicates the object that generated the alert, in the obproxy_cluster_id=xx:obproxy_cluster=xx:svr_ip=xx format. obproxy_cluster_id indicates the ID of the OBProxy cluster that generated the alert. obproxy_cluster indicates the name of the OBProxy cluster that generated the alert. svr_ip indicates the IP address of the OBProxy server that generated the alert.

Impact on the system

The obproxyd.sh process monitors the status of the obproxy process. When OBProxy quits unexpectedly, the obproxyd.sh process will try to restart the obproxy process. If both the obproxyd.sh process and the obproxy process are abnormal, the OBProxy may not be restarted after it is stopped.

Possible causes

This problem is commonly found in the following scenarios:

A network communication error occurs.
The obproxyd.sh process unexpectedly stops.
The obproxyd.sh process is alive but it does not report its heartbeat.

Suggested solutions

Check whether the OBProxy server is functioning.

Try to start the OBProxy server.
- If the server can be started, go to Step 2.
- If the OBProxy server cannot be started, the issue is caused by a failure of the OBProxy server. We recommend that you replace this OBProxy.
  
  To replace a faulty OBProxy, you need to first add a new OBProxy into the OBProxy cluster and then delete the faulty OBProxy.
Run the ssh command to log on to the OBProxy server.
- If you can log on, the problem may have been caused by an unknown issue. Go to Step 3.
- If you cannot log on and the system prompts that the server is busy, we recommend that you restart the OBProxy.
  1. Choose OCP > OBProxy . On the page that appears, find the cluster where the faulty OBProxy server is located in the OBProxy clusters list, and then click the name of the cluster.
  2. Find the faulty OBProxy server in the OBProxy servers list, and then click Restart in the Actions column.
  3. If you still cannot connect to it after the restart, go to Step 3.

Check whether the OBProxy is overloaded or the network communication fails, making it unable to report its heartbeats.

Run the following commands to check the process status and resource usage.

# Check whether the obproxy process is alive. If not, restart the obproxy process. 
ps aux | grep obproxyd

# If the CPU utilization and memory usage are too high, the obproxy process may not be able to provide services. 
# Check the CPU utilization and memory usage of the OBServer. 
top -n 1 -p $(pgrep obproxyd)

# Check the available space of the data disk and log disk. 
df | grep /data


# Check the network connections of the OBServer. If the number of network connections is 0, a network failure may have occurred. 
netstat -anp | grep 2883 | wc -l

If no issue is found in this step, go to the next step.

Collect the log information and contact OCP technical support for help.
1. Collect OBProxy logs.
  
  Errors recorded in error logs of OBProxy usually trigger alerts in the OCP console. You can go to the Alert Events page of the OCP console to check for OBProxy log alerts.
2. Collect OS logs. Search for logs in the /var/log/messages log file based on the keyword error and check the returned information of the system.