Description
This alert is triggered when the OBServer disk is read-only.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_host_disk_readonly_flag |
| Source | node_exporter |
| Collected metric | node_filesystem_readonly |
| Metric expression | max(node_filesystem_readonly{@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
OCP-Agent runs timed tasks to monitor whether the disk can be read in each collection cycle. If the disk is read-only, it returns 1 to the collected metric node_filesystem_readonly. Otherwise, it returns 0. In the former case, it also attaches the flag is_ob_disk="1" to the disk where the OBServer installation directory, data directory, or log directory is located.
The value of the metric ob_host_disk_readonly_flag indicates whether the disk is read-only.
The alert is triggered when the value is 1, indicating that the disk is read-only.
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| ob_host_disk_readonly_flag | 1 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}
Overview example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The OBServer disk is read-only.
Details example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The OBServer disk is read-only.
${alarm_target} follows the ob_cluster=xxxxxxx:svr_ip=xxxxxx format.
ob_cluster indicates the name of the cluster that generated the alert.
svr_ip indicates the IP address of the OBServer of the cluster that generated the alert.
Impact on the system
If the log disk and data disk are read-only, the OBServer cannot properly run.
If the log file disk is read-only, the application cannot normally print logs.
Possible causes
The disk does not have sufficient space.
A permission exception occurred in the operating system.
A physical medium error occurred. For example, the disk malfunctions.
Suggested solutions
Check the disk space.
Insufficient disk space also triggers the following alerts. You can refer to the corresponding topics, take actions as needed, and check whether the ob_host_disk_readonly alert is cleared.
Check for other issues.
Many other issues can cause a disk to become read-only. We recommend that you contact your Linux O&M engineer for troubleshooting.
If that is not an option, run the following command to view the logs that record error messages about a disk being read-only in the /var/log/messages directory and Google search the error message for solutions.
# List all the read-only disks grep "[[:space:]]ro[[:space:],]" /proc/mounts # /dev/loop0 in the following message indicates "read-only". tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0 /dev/loop0 /home/admin iso9660 ro,relatime,nojoliet,check=s,map=n,blocksize=2048 0 0 # Query the reason why the disk is read-only. You can replace the '/dev/loop0' with 'readonly'. grep '/dev/loop0' /var/log/messages -Rn