os_observer_not_exist

2023-08-22 02:46:05  Updated

Description

This alert is triggered when the observer process on an OBServer does not exist.

Principle

Parameter Value
Metric observer_process_exists
Source This metric is a basic host monitoring metric. To check whether any observer process exists, you can run ps -ef|grep -w observer|grep -v grep|wc -l to return the number of observer processes in the system.
Collected metric process_exists
Metric expression min(process_exists{name="observer",@LABELS}) by (@GBLABELS)
Collection cycle 1 second

Alert rule

Metric expression Metric description Default threshold Detection cycle Elimination cycle
observer_process_exists == 0
  • 0: The process does not exist.
  • 1: The process exists.
  • 0 10 seconds 5 minutes

    Alert information

    Trigger method Alert level Scope
    Based on the expression of the metric Critical Server

    Alert templates

    • Overview
      • Template: ${alarm_target} ${alarm_name}
      • Example: ob_cluster=obcluster-1631964370:svr_ip=xxx.xxx.xxx.xxx os_observer_not_exist
    • Details
      • Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, alert: os_observer_not_exist.
      • Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, alert: os_observer_not_exist.

    Impact on the system

    • For a single-replica system, system services will be unavailable if the observer process does not exist.
    • For a multi-replica OceanBase cluster, the availability of the cluster may decrease if the observer process on an OBServer does not exist. For example, the number of zones changes from 3 to 2, or the number of Internet Data Centers (IDCs) in the same region changes from 3 to 2.

    Possible causes

    The faulty OBServer is restarted accidentally. For example, the observer process is killed when the system resources are insufficient.

    Suggested solutions

    1. Check whether the basic monitoring metrics of the host, such as the memory usage, CPU utilization, load, and disk usage, are as expected.

    2. Check whether a large number of ERROR logs exist in the run logs of the OBServer.

      tail -10000 /home/admin/oceanbase/log/observer.log.wf | grep ERROR | wc -l
      

      If yes, contact Technical Support.

    3. Check the OS logs. Search for the keyword "error" in the /var/log/messages log file and check the returned information.

    Contact Us