Description
This alert is triggered when the number of OBServer file descriptors (FDs) exceeds 65% of the maximum system capacity.
Principle
| Parameter | Value |
|---|---|
| Metric | observer_fd_usage, observer_fd_count |
| Source | observer_fd_count: ls /proc/<observer pid>/fd | wc -lYou can run the ulimit -n command to view the maximum FD capacity of the system. |
| Collected metric | observer_fd_count, observer_ulimit_max_fd_count |
| Metric expression | |
| Collection cycle | 1 second |
Alert rule
| Metric expression | Metric description | Default threshold (unit: percentage) | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| observer_fd_usage > 65 and observer_fd_count > 0 | 65% | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert templates
- Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1631964370:svr_ip=xxx.xxx.xxx.xxx os_observer_fd_usage
- Details
- Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, alert: ${alarm_name}, FD count ${observer_fd_count_value}, FD usage ${observer_fd_usage_value_zh_cn} exceeds ${observer_fd_usage_alarm_threshold}%.
- Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, alert: os_observer_fd_usage, FD count is 45000, FD usage 68.666% exceeds 65%.
Impact on the system
There is an upper limit on the number of FDs in the OS. When this upper limit is exceeded, the system returns an error message "too many open files", which will affect the operation of the OS.
Possible causes
Process resources are not released in a timely manner.
- Socket communication connections are not closed correctly.
- A large number of database connection requests are initiated.
- Too many FD files are created because a large number of I/O requests are initiated in the database.
Suggested solutions
Perform the following steps:
Run the following command to query the number of FDs:
- Query the number of FDs used by the observer process:
lsof -p <observer pid> | wc -l - Query the maximum number of FDs allowed in the system:
ulimit -n
- Query the number of FDs used by the observer process:
If the number of used FDs exceeds 65% of the maximum number, contact O&M engineers.