Description
This alert is triggered when OceanBase Cloud Platform (OCP) detects an unconnectable OBServer in a managed OceanBase cluster.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_connectable |
| Source | SQL: select 1; |
| Collected metric | ob_connectable |
| Metric expression | min(oceanbase_connectivity{@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
Note
The metric source of this alert is special. OCP tries to connect to a local OBServer by using the database connection pool and executes the SQL statement select 1; to detect whether the OBServer is connectable. Then, it returns the result to the collected metric.
The value of the metric ob_connectable indicates whether the OBServer is connectable. If the value of the metric is 0, this OBServer is unconnectable and this alert is triggered.
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| ob_connectable | 0 | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Stopped | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}
Overview example: ob_cluster-1:svr_ip=xxx.xxx.xxx.xxx. The OBServer cannot be connected.
Details example: Cluster:ob_cluster-1, Host:xxx.xxx.xxx.xxx, Alert: The OBServer cannot be connected.
${alarm_target} follows the ob_cluster=xxxxxxx:svr_ip=xxxxxx format, where ob_cluster indicates the name of the cluster that generated the alert and svr_ip indicates the IP address of the OBServer of the cluster that generated the alert.
Impact on the system
OBServers on some hosts are unavailable, reducing the availability of some data replicas.
Possible causes
The observer process exits unexpectedly or the OBServer is overloaded and cannot respond to requests.
The server where the OceanBase Database service is deployed is faulty, causing the OBServer to be unavailable.
Suggested solutions
Check whether the server where the OceanBase Database service is deployed is functioning.
Check whether the server where the OceanBase Database service is deployed can be started.
If the server can be started, go to Step 2.
If not, the server where the OceanBase Database service is deployed is faulty, causing the OBServer to be unavailable. We recommend that you replace the OBServer.
Run the SSH command to try to log on to the server where the OceanBase Database service is deployed.
If you can log on, the problem may have been caused by an unknown issue. Go to Step 3.
If you cannot log on, the server where the OceanBase Database service is deployed is not responding. We recommend that you run the following commands to restart the OBServer.
# Log on to the server where the OceanBase Database service is deployed as the administrator. # Kill the observer process. pgrep observer | kill # If the observer process does not exit, force kill it. pgrep observer | kill -9 # Restart the OBServer. cd /home/admin/oceanbase && bin/observerYou can also initiate a restart task in the
OBServers list on theOverview page of the cluster.If you still cannot connect to the OBServer after the restart, go to Step 3.
Check whether the OBServer is overloaded or the network connection is disconnected.
Run the following commands to check the process status and resource usage.
# Check whether the observer process is alive. If not, restart the observer process. ps aux | grep observer # If the CPU utilization and memory usage are too high, the OBServer may be malfunctioning. # Check the CPU utilization and memory usage of the OBServer. top -n 1 -p $(pgrep observer) # Check the available space of the data disk and log disk. df | grep /data # Check the network connections of the OBServer. If the number of network connections is 0, a network failure may have occurred. netstat -anp | grep 2881 | wc -lIf no issue is found in this step, go to the next step.
Collect the log information and contact OCP Technical Support for help.
Check the logs of the OBServer.
Errors recorded in error logs of an OBServer usually trigger alerts on OCP. You can go to the Alert Events page of OCP to check whether an OceanBase log alert (ob_log_alarm) has been reported. OBServer logs are saved in
/home/admin/oceanbase/log. Prioritize.wffiles that record WARN- and ERROR-level logs.Check OS logs. Search for required content in the
/var/log/messageslog files based on the keyworderrorand check the returned information.