obproxy_process_dead |V3.1.2|OceanBase Cloud Platform| docs|Distributed Database

obproxy_process_dead

Last Updated：2023-08-15 11:21:17 Updated

Description

This alert is triggered when the obproxy process of the OBProxy server does not exist.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter	Value
Metric	obproxy_process_exists Note The value of the metric indicates the availability of the obproxy process. Valid values: 1 and 0. The value 1 indicates that the process exists. The value 0 indicates that the process does not exist and triggers this alert.
Source	`javascript ps -ef\|grep -w obproxy\|grep -v grep\|wc -l` Note The metric source of this alert is special. OCP-Agent runs the preceding Linux command to check whether the obproxy process exists.
Collected metric	obproxy_process_exists
Metric expression	obproxy_process_exists{app="ODP",@LABELS}
Collection cycle	1 second

Alert rule

Metric	Default threshold	Duration	Detection cycle	Time before clearance
obproxy_process_exists	0	0 seconds	10 seconds	5 minutes

Alert information

Trigger method	Alert level	Scope
Metric expression	Stopped	Server

Alert templates

Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}
Overview example: obproxy_cluster_id=3:obproxy_cluster=obproxy_02:svr_ip=192.168.1.1. The obproxy process does not exist.
Details example: obproxy_cluster_id=3:obproxy_cluster=obproxy_02:svr_ip=192.168.1.1. The obproxy process does not exist.

${alarm_target} indicates the object that generated the alert, in the obproxy_cluster_id=xx:obproxy_cluster=xx:svr_ip=xx format. obproxy_cluster_id indicates the ID of the OBProxy cluster that generated the alert. obproxy_cluster indicates the name of the OBProxy cluster that generated the alert. svr_ip indicates the IP address of the OBProxy server that generated the alert.

Impact on the system

OBProxy provides access to the OBServer cluster as the proxy. When the obproxy process does not exist, the application cannot connect to the database.

Possible causes

This problem is commonly found in the following scenarios:

A network communication error occurs.
The obproxy process unexpectedly stops.
The obproxy process is alive but is unresponsive, without reporting its heartbeats.

The heartbeat reporting failure is often caused by insufficient memory or disk space, network failure, or overloaded OBProxy.

Suggested solutions

Check whether the OBProxy server is functioning.

Try to start the OBProxy server.
- If the server can be started, go to Step 2.
- If the OBProxy server cannot be started, the issue is caused by a failure of the OBProxy server. We recommend that you replace this OBProxy.
  
  To replace a faulty OBProxy, you need to first add a new OBProxy into the OBProxy cluster and then delete the faulty OBProxy.
Run the ssh command to log on to the OBProxy server.
- If you can log on, the problem may have been caused by an unknown issue. Go to Step 3.
- If you cannot log on and the system prompts that the server is busy, we recommend that you restart the OBProxy.
  1. Choose OCP > OBProxy . On the page that appears, find the cluster where the faulty OBProxy server is located in the OBProxy clusters list, and then click the name of the cluster.
  2. Find the faulty OBProxy server in the OBProxy servers list, and then click Restart in the Actions column.
  3. If you still cannot connect to it after the restart, go to Step 3.

Check whether the OBProxy is overloaded or the network is disconnected.

Run the following commands to check the process status and resource usage.

# Check whether the obproxy process is alive. If not, restart the obproxy process. 
ps aux | grep obproxy

# If the CPU utilization and memory usage are too high, the obproxy process may not be able to provide services. 
# Check the CPU utilization and memory usage of the OBServer. 
top -n 1 -p $(pgrep obproxy)

# Check the available space of the data disk and log disk. 
df | grep /data


# Check the network connections of the OBServer. If the number of network connections is 0, a network failure may have occurred. 
netstat -anp | grep 2883 | wc -l

If no issue is found in this step, go to the next step.

Collect the log information and contact OCP technical support for help.
1. Collect OBProxy logs.
  
  Errors recorded in error logs of OBProxy usually trigger alerts in the OCP console. You can go to the Alert Events page of the OCP console to check for OBProxy log alerts.
2. Collect OS logs. Search for logs in the /var/log/messages log file based on the keyword error and check the returned information of the system.