Alert description
This alert monitors whether the CPU binding configuration of the OBServer or OBProxy process on the host is consistent with the binding configuration recorded in the OCP MetaDB. An alert is triggered when the two configurations are inconsistent.
Alert principle
Parameter |
Value |
|---|---|
| Monitoring Metrics | host_process_cpu_bind_core_alarm_status |
| Monitoring Expression | max(host_process_cpu_bind_core_alarm_status{@LABELS}) by (@GBLABELS) |
| Metric Collection | host_process_cpu_bind_core_alarm_status |
| Metric Source | OCP Proactive Detection |
| Collection Cycle | 60 Seconds |
OCP periodically checks the consistency of CPU binding configurations for the OBServer and OBProxy processes on hosts. The specific principle is as follows:
OCP retrieves all configured CPU binding information (CpuBindCoreConfigEntity) and corresponding server status information (CpuBindCoreServerStateEntity) for that host from the MetaDB.
For each CPU binding configuration, OCP determines the current CPU binding range queried by the OCP-Agent process (obtained by calling the
taskset -pc pidcommand with the process PID).Compare the actual core binding range with the expected core binding range recorded in the MetaDB.
If the actual range does not match the expected range, set the metric host_process_cpu_bind_core_alarm_status to 1 (alert status). Otherwise, set it to 0 (normal status).
When the metric value equals 1, the trigger condition is met and an alert is generated.
Rule information
Monitoring Metrics |
Default Threshold |
Duration |
Detection Cycle |
Elimination Cycle |
|---|---|---|---|---|
| host_process_cpu_bind_core_alarm_status | 1 | 120 Seconds | 60 Seconds | 5 Minutes |
Alert information
Alert Trigger Method |
Alert Level |
Scope |
|---|---|---|
| Based on monitoring metric expressions | Warning | Host |
Alert template
Alert overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=-1:svr_ip=xxx.xxx.xxx.xxx OceanBase CPU binding configuration inconsistent
Alert Details
- Template: Cluster ID: ${ob_cluster_id}, Host IP: ${host_ip}. The current CPU binding configuration is inconsistent. Please check the host's CPU binding configuration and modify it as soon as possible.
- Example: Cluster ID: -1, Host IP: xxx.xxx.xxx.xxx. The current CPU binding configuration is inconsistent. Please check the host's CPU binding configuration and modify it as soon as possible.
Alert recovery
- Template: Alert: ${alarm_name}, CPU Binding Configuration Inconsistent: ${value_shown}
- Example: Alert: CPU Binding Configuration Inconsistent, CPU Binding Configuration Inconsistent: 0
Impact on the system
Inconsistent CPU binding configurations mean that the actual number of CPU cores used by a process does not match the planned number, which may lead to the following impacts:
Resource contention: Multiple processes may share the same CPU core, leading to competition for CPU resources and affecting performance stability.
Performance degradation: The process is not bound to the expected CPU core, which may lead to NUMA cross-node memory access and increased latency.
Reduced isolation: The primary purpose of CPU pinning is to achieve resource isolation. Inconsistent configurations will render the isolation strategy ineffective.
Possible causes
The process on the host was manually restarted, and the bind-core configuration did not take effect or was not correctly re-bound after the restart.
The taskset configuration has been manually modified.
Solution
Confirm the alert information: Locate the specific host and process based on the cluster ID (ob_cluster_id) and host IP (svr_ip) in the alert.
Check the current CPU binding configuration:
# View the CPU binding information of the observer process. The query method for the OBProxy process is similar. ps -ef | grep observer taskset -pc <observer_pid>Compare the expected configuration: In the OCP console, view the CPU binding configuration of the process corresponding to this host, and compare it with the actual binding range found to identify the specific discrepancies.
Fixed the CPU binding configuration:
Method 1 (recommended): Redistribute the CPU binding configuration through the OCP console. OCP will automatically complete the binding operation via OCP-Agent.
Method 2: Manually use the taskset command on the host to rebind CPU cores:
taskset -pc <cpu_range> <PID>
Verify the fix result: After the fix, wait for the next detection cycle (60 seconds) and observe whether the alert is automatically cleared. If the configurations are consistent, the alert will be automatically cleared within the clearance cycle (300 seconds).
