Alert description
Monitors whether there are OBServer nodes that cannot be connected to in the OceanBase cluster managed by the current OCP.
Alert principle
The following table lists the key parameters involved in the monitoring logic of this alert.
| Parameter | Value |
|---|---|
| Monitoring metric | ob_connectable |
| Metric source | SQL: select 1; |
| Metric collection | ob_connectable |
| Monitoring expression | min(oceanbase_connectivity{@LABELS}) by (@GBLABELS) |
| Metric collection interval | 1 second |
Note
The metric source of this alert is special. It connects to the local OBServer node through the database connection pool and executes the SQL statement select 1; to check whether the OBServer node is connectable, and then passes the result to the metric collection.
The value of the monitoring metric ob_connectable indicates whether the OBServer node is connectable. When the monitoring metric value is 0, the OBServer node is not connectable, and the alert is triggered.
Rule information
| Monitoring metric | Default threshold | Duration | Detection interval | Elimination interval |
|---|---|---|---|---|
| ob_connectable | 0 | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the monitoring metric | Downtime | Server |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.xxx OceanBase server cannot be connected
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: OceanBase server cannot be connected.
Alert recovery
- Template: Alert: ${alarm_name}, OBServer connection status: ${recover_value}
- Example: Alert: OceanBase server cannot be connected, OBServer connection status: 1
Here, ${alarm_target} is in the ob_cluster=xxxxxxx:svr_ip=xxxxxx format. ob_cluster is the name of the cluster that generated the alert; svr_ip is the IP address of the OBServer node in the cluster that generated the alert.
Impact on the system
The OBServer nodes of some hosts are unavailable, leading to a decrease in the availability of some data replicas.
Possible causes
The observer process unexpectedly exits or the OBServer node is overloaded and cannot respond to requests.
A failure in the OBServer node's host makes the OBServer node inaccessible.
Procedure
Check whether the OBServer node is faulty.
Check whether the OBServer node can be started.
Yes, proceed to step 2.
No, the OBServer node is faulty. We recommend that you replace the OBServer node.
Log in to the OBServer node by using the SSH command and check whether the login is successful.
Yes, proceed to step 3.
No, the OBServer node is busy. We recommend that you restart the OBServer node by using the following commands:
# Log in to the host where the OBServer node is located as the admin user. # Try to kill the process. pgrep observer | kill # If the process does not exit, forcibly kill it. pgrep observer | kill -9 # Restart the OBServer node. cd /home/admin/oceanbase && bin/observerYou can also initiate a restart task on the OBServers page of the Overview page of the OCP cluster.
If the OBServer node cannot be connected after the restart, proceed to step 3.
Check whether the OBServer node is overloaded or the network is disconnected.
Run the following commands to check the process status and resource usage.
# Check whether the process is alive. If not, restart it. ps aux | grep observer # If the CPU, memory, or other resources are excessively used, the OBServer node may not work properly. # Check the CPU and memory usage of the OBServer node. top -n 1 -p $(pgrep observer) # Check the remaining space of the data disk and log disk. df | grep /data # Check the number of network connections of the OBServer node. If the number is 0, a network failure may occur. netstat -anp | grep 2881 | wc -lIf no errors are found in the preceding steps, proceed to step 4.
If the OBServer node cannot be connected, collect the logs and contact Technical Support.
Check the OBServer node logs.
Generally, the ERROR logs generated by the OBServer node will trigger an alert in OCP. First, check whether an OB log alert is generated on the OCP alert events page. The OBServer node logs are stored in the
/home/admin/oceanbase/logdirectory. First, check the files with the.wfsuffix. These files record WARN and ERROR-level logs.Check the OS logs. Search for the
errorkeyword in the/var/log/messageslog file and observe the system response.