ob_host_mem_percent_over_threshold Host memory usage exceeds the threshold

2025-09-08 08:15:43  Updated

Alert Description

This alert is triggered when the overall physical memory usage of the server hosting the OBServer node exceeds the threshold.

Alerting principle

The following table describes the key parameters in the alerting monitoring logic.

Parameter Value
Monitoring metric ob_host_mem_percent
Data source Collected by the node_exporter process.
Collected metrics node_memory_MemFree_bytes, node_memory_Cached_bytes, node_memory_Buffers_bytes, node_memory_MemTotal_bytes
Monitoring expression (1 - (avg(node_memory_MemFree_bytes{@LABELS}) by (@GBLABELS) + avg(node_memory_Cached_bytes{@LABELS}) by (@GBLABELS) + avg(node_memory_Buffers_bytes{@LABELS}) by (@GBLABELS)) / avg(node_memory_MemTotal_bytes{@LABELS}) by (@GBLABELS)) * 100
Collection interval 1 second

Note

The data source of this alert is special and is collected by OCP-Agent through node_exporter.

The value of the monitoring metric ob_host_mem_percent indicates the memory usage of the OBServer node. An alert is triggered when the usage exceeds the threshold (90% by default).

Rule Information

Monitoring Metric Default Threshold (Unit: %) Duration Alert Interval Clear Interval
ob_host_mem_percent 95: Severe
97: Downtime
0 seconds 60 seconds 5 minutes

Alert Information

Alert Trigger Method Alert Level Scope
Expression Based on Monitoring Metric Severe Server

Alert Template

  • Alert Overview

    • Template: ${alarm_target} ${alarm_name}
    • Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.xxx Server Memory Usage Exceeded
  • Alert Details

    • Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. Memory usage ${value_shown} exceeds ${alarm_threshold} %.
    • Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: Server Memory Usage Exceeded. Memory usage 91 % exceeds 90 %.
  • Alert Recovery

    • Template: Alert: ${alarm_name}, Server Memory Usage: ${value_shown}
    • Example: Alert: Server Memory Usage Exceeded, Server Memory Usage: 85 %

Impact on the System

When the server memory is insufficient, the OBServer node cannot work normally.

Possible Causes

  • Non-observer processes are executing other memory-intensive tasks.

  • The OBServer node is running abnormally, such as 500 tenant memory exceeded.

Solution

  1. Check whether the OBServer node is in the normal state.

    View the status of the OBServer node in the OBServers section on the Overview page of the cluster, or connect to the sys tenant of the OceanBase cluster and execute the following SQL statement:

    select status from __all_sever where svr_ip='your server ip';
    
    • If the status is inactive, restart the observer process.

    • If the status is "Running", it indicates that the issue is not related to the OBServer node. Proceed to the next step.

  2. Check whether the memory usage exceeds the threshold due to regular business operations.

    On the details page of the OBServer node that triggered the alert, go to Monitoring > Host Resources and view the memory usage chart.

    • If the memory usage suddenly spiked at the time of the alert, it was not due to regular business operations. Proceed to the next step for troubleshooting.

    • If the memory usage gradually increased at the time of the alert, it was due to regular business operations.

      Replace the OBServer node with one that has more memory. For more information, see Replace an OBServer node.

  3. Run the following command to identify the process that is consuming a large amount of memory:

    # Find the top 5 memory-consuming processes, sorted by memory usage
    ps -o %mem,pid,cmd  -ax | sort -rn | head -5
    
    • If it is not the observer process that is consuming excessive memory, analyze the cause of high memory usage in the corresponding program.

      If the issue does not affect business operations, you can stop the process that is consuming excessive memory.

    • If it is the observer process that is consuming excessive memory, it may be due to an abnormal state of the OBServer node.

      In this case, the ob_tenant500_mem_hold_over_threshold OB 500 tenant memory usage exceeds threshold alert is often triggered. You can resolve this alert first. Then, wait 5 minutes and check whether this alert is automatically cleared.

Contact Us