os_tsar_cpu_sys_abnormal

2024-07-30 09:31:29  Updated

Description

This alert is triggered when the CPU system usage exceeds 40%.

Principle

Parameter Value
Metric cpu_system
Source This metric is a basic host monitoring metric collected by node_exporter.
The metric is equivalent to Cpu-sy (percentage of CPU time spent running kernal-mode code) in the top command output.
Collected metric node_cpu_seconds_total
Metric expression 100 * sum(rate(node_cpu_seconds_total{mode=“system”, @LABELS}[@INTERVAL]))
Collection cycle 1 second

Alert rule

Metric expression Metric description Default threshold (unit: percentage) Detection cycle Elimination cycle
cpu_system > 40 The percentage of CPU time spent running kernel-mode code. 40% 10 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Critical Server

Alert templates

  • Overview
    • Template: ${alarm_target} ${alarm_name}
    • Example: svr_ip=xxx.xxx.xxx.xxx os_tsar_cpu_sys_abnormal
  • Details
    • Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, alert: ${alarm_name}, percentage of system CPU time ${value_shown} exceeds ${alarm_threshold}%.
    • Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, alert: os_tsar_cpu_sys_abnormal, percentage of system CPU time 51.01% exceeds 40%.

Impact on the system

The percentage of system CPU time is the percentage of CPU time spent running kernel-mode code. For example, the time can be spent switching thread context, allocating memory, performing I/O operations, and creating a subprocess. If the percentage of system CPU time is high, the CPU time left to applications is limited, which may affect the operation of applications.

Possible causes

A high percentage of system CPU time may be caused by a large number of system calls, which are usually initiated by user-mode processes. The possible causes are as follows:

  • Context switches frequently occur between threads.
  • A large number of concurrent I/O operations are initiated.
  • A large number of processes are created. For example, applications execute commands to create a large number of subprocesses when running.
  • Other kernel-mode activities are conducted.

Suggested solutions

Perform the following steps:

  1. Check whether the number of application processes, the number of threads, and the number of subprocesses are as expected.
  2. Run the iostat and iotop commands to check whether system I/O usage and process I/O usage are normal.

Contact Us