Description
This alert is triggered when the server SSD usage in percentage exceeds the specified threshold.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ssd_wear_indicator The server SSD usage in percentage. A higher value indicates a higher SSD usage. |
| Source | The metric is obtained by executing the built-in script of StarAgent that detects the SSD usage. |
| Collected metric | ssd_wear_indicator |
| Metric expression | max(ssd_wear_indicator{@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
Alert rule
| Metric | Default threshold (unit: %) | Duration | Alert cycle | Time before clearance |
|---|---|---|---|---|
| ssd_wear_indicator | 95 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the metric expression | Critical | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name}. The SSD usage reaches ${value_shown}, which exceeds ${alarm_threshold} %.
Overview example: svr_ip=192.168.0.1. The SSD usage of the server exceeds the specified threshold.
Details example: Cluster: obcluster-1. Host: 192.168.0.1. Alert: The SSD usage of the server exceeds the specified threshold. The SSD usage reaches 97 %, which exceeds 95 %.
Impact on the system
After the SSD usage reaches 100%, the SSD becomes unavailable. Therefore, it is necessary to monitor the SSD usage. Full SSD usage may have the following impacts:
Data write to the SSD fails. The data disk, log disk, and installation disk of OceanBase Database can be SSDs, and data is written to the SSDs during minor compactions, log recording, and major compactions.
Data read from the SSD fails. If data cannot be written to the SSD, the process can be stuck, which affects the data read.
Possible causes
The SSD is full.
Solutions
Check whether the SSD is full or damaged.
Contact OceanBase Technical Support to check the remaining write/erase cycles of the SSD by using specialized tools. If the SSD is almost worn out, replace the SSD in time.