ob_cannot_connected

2024-03-22 09:34:34  Updated

Description

This alert is triggered when OceanBase Cloud Platform (OCP) detects an unconnectable OBServer in a managed OceanBase cluster.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter Value
Metric ob_connectable
Source SQL: select 1;
Collected metric ob_connectable
Metric expression min(oceanbase_connectivity{@LABELS}) by (@GBLABELS)
Collection cycle 1 second

Note

The metric source of this alert is special. OCP tries to connect to a local OBServer by using the database connection pool and executes the SQL statement select 1; to detect whether the OBServer is connectable. Then, it returns the result to the collected metric.

The value of the metric ob_connectable indicates whether the OBServer is connectable. If the value of the metric is 0, this OBServer is unconnectable and this alert is triggered.

Alert rule

Metric Default threshold Duration Detection cycle Time before clearance
ob_connectable 0 0 seconds 10 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Metric expression Stopped Server

Alert templates

  • Overview: ${alarm_target} ${alarm_name}

  • Details: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}

  • Overview example: ob_cluster-1:svr_ip=xxx.xxx.xxx.xxx. The OBServer cannot be connected.

  • Details example: Cluster:ob_cluster-1, Host:xxx.xxx.xxx.xxx, Alert: The OBServer cannot be connected.

${alarm_target} follows the ob_cluster=xxxxxxx:svr_ip=xxxxxx format, where ob_cluster indicates the name of the cluster that generated the alert and svr_ip indicates the IP address of the OBServer of the cluster that generated the alert.

Impact on the system

OBServers on some hosts are unavailable, reducing the availability of some data replicas.

Possible causes

  • The observer process exits unexpectedly or the OBServer is overloaded and cannot respond to requests.

  • The server where the OceanBase Database service is deployed is faulty, causing the OBServer to be unavailable.

Suggested solutions

  1. Check whether the server where the OceanBase Database service is deployed is functioning.

    Check whether the server where the OceanBase Database service is deployed can be started.

    • If the server can be started, go to Step 2.

    • If not, the server where the OceanBase Database service is deployed is faulty, causing the OBServer to be unavailable. We recommend that you replace the OBServer.

  2. Run the SSH command to try to log on to the server where the OceanBase Database service is deployed.

    • If you can log on, the problem may have been caused by an unknown issue. Go to Step 3.

    • If you cannot log on, the server where the OceanBase Database service is deployed is not responding. We recommend that you run the following commands to restart the OBServer.

      # Log on to the server where the OceanBase Database service is deployed as the administrator.
      
      # Kill the observer process.
      pgrep observer | kill
      
      # If the observer process does not exit, force kill it.
      pgrep observer | kill -9
      
      # Restart the OBServer.
      cd /home/admin/oceanbase && bin/observer
      

      You can also initiate a restart task in the OBServers list on the Overview page of the cluster.

      If you still cannot connect to the OBServer after the restart, go to Step 3.

  3. Check whether the OBServer is overloaded or the network connection is disconnected.

    Run the following commands to check the process status and resource usage.

    # Check whether the observer process is alive. If not, restart the observer process. 
    ps aux | grep observer
    
    # If the CPU utilization and memory usage are too high, the OBServer may be malfunctioning. 
    # Check the CPU utilization and memory usage of the OBServer. 
    top -n 1 -p $(pgrep observer)
    
    # Check the available space of the data disk and log disk. 
    df | grep /data
    
    
    # Check the network connections of the OBServer. If the number of network connections is 0, a network failure may have occurred. 
    netstat -anp | grep 2881 | wc -l
    

    If no issue is found in this step, go to the next step.

  4. Collect the log information and contact OCP Technical Support for help.

    1. Check the logs of the OBServer.

      Errors recorded in error logs of an OBServer usually trigger alerts on OCP. You can go to the Alert Events page of OCP to check whether an OceanBase log alert (ob_log_alarm) has been reported. OBServer logs are saved in /home/admin/oceanbase/log. Prioritize .wf files that record WARN- and ERROR-level logs.

    2. Check OS logs. Search for required content in the /var/log/messages log files based on the keyword error and check the returned information.

Contact Us