Description
This alert is triggered when the CPU system usage exceeds 40%.
Principle
| Parameter | Value |
|---|---|
| Metric | cpu_system |
| Source | This metric is a basic host monitoring metric collected by node_exporter. The metric is equivalent to Cpu-sy (percentage of CPU time spent running kernal-mode code) in the top command output. |
| Collected metric | node_cpu_seconds_total |
| Metric expression | 100 * sum(rate(node_cpu_seconds_total{mode="system", @LABELS}[@INTERVAL])) |
| Collection cycle | 1 second |
Alert rule
| Metric expression | Metric description | Default threshold (unit: percentage) | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| cpu_system > 40 | The percentage of CPU time spent running kernel-mode code. | 40% | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert templates
- Overview
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx os_tsar_cpu_sys_abnormal
- Details
- Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, alert: ${alarm_name}, percentage of system CPU time ${value_shown} exceeds ${alarm_threshold}%.
- Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, alert: os_tsar_cpu_sys_abnormal, percentage of system CPU time 51.01% exceeds 40%.
Impact on the system
The percentage of system CPU time is the percentage of CPU time spent running kernel-mode code. For example, the time can be spent switching thread context, allocating memory, performing I/O operations, and creating a subprocess. If the percentage of system CPU time is high, the CPU time left to applications is limited, which may affect the operation of applications.
Possible causes
A high percentage of system CPU time may be caused by a large number of system calls, which are usually initiated by user-mode processes. The possible causes are as follows:
- Context switches frequently occur between threads.
- A large number of concurrent I/O operations are initiated.
- A large number of processes are created. For example, applications execute commands to create a large number of subprocesses when running.
- Other kernel-mode activities are conducted.
Suggested solutions
Perform the following steps:
- Check whether the number of application processes, the number of threads, and the number of subprocesses are as expected.
- Run the
iostatandiotopcommands to check whether system I/O usage and process I/O usage are normal.