Description
This alert is triggered when no clock synchronization service (Chrony or NTP) is available on the server managed by OceanBase Cloud Platform (OCP).
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | host_ntp_service_exist |
| Source | unknow pgrep chronyd # Check whether the Chrony service is available. pgrep ntpd # Check whether the NTP service is available. |
| Collected metric | ntp_service_exist |
| Metric expression | max(ntp_service_exist{app="HOST",@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
Note
The metric source of this alert is special. The OCP-Agent runs the commands in the preceding table to check the clock synchronization service of the local server. By default, the Chrony service is used. Therefore, OCP-Agent runs the pgrep chronyd command first.
The value of the metric host_ntp_service_exist indicates the availability of the clock synchronization service. The value 1 indicates that the service is available. When the value is 0, this alert is triggered.
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| host_ntp_service_exist | 0 | 0 | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. No clock synchronization service (NTP or Chrony) is available on the server.
Overview example: svr_ip=192.168.0.1. No clock synchronization service is available on the server.
Details example: svr_ip=192.168.0.1. No clock synchronization service is available on the server. No clock synchronization service (NTP or Chrony) is available on the server.
${alarm_target} indicates the object that generated the alert, in the svr_ip=xx.xx.xx.xx format.
Impact on the system
After the cluster runs for some time, the server clock inconsistency may occur. This may cause the OceanBase cluster to become unavailable.
Possible causes
No clock synchronization service (Chrony or NTP) is installed on the server.
The clock synchronization process (chronyd or ntpd) on the server exits unexpectedly.
Suggested solutions
Run the following command to check whether a clock synchronization service (Chrony or NTP) is installed on the server.
rpm -qa|grep chrony # Check whether the Chrony service is installed. rpm -qa|grep ntp # Check whether the NTP service is installed.If version information is returned, the service has been installed. Proceed to Step 2.
If no response is returned, the service is not installed. If neither Chrony nor NTP is installed, install a clock synchronization service first.
For more information about the installation and configuration of the Chrony and NTP services, see examples shared on the Internet. Brief introduction to the installation and configuration of the Chrony and NTP services:
Run the following command to install a clock synchronization service (Chrony or NTP).
yum install -y chrony # Install the Chrony service. yum install -y ntp # Install the NTP service.Run the following command to start the clock synchronization service.
systemctl start chronyd # Start the Chrony service. systemctl start ntpd # Start the NTP service.Manually clear the alert and check whether it recurs. If the alert recurs, proceed to Step 3.
Run the following command to check whether the clock synchronization process (chronyd or ntpd) has exited unexpectedly.
systemctl status chronyd # Check the status of the Chrony service. systemctl status ntpd # Check the status of the NTP service.If the value of the Active parameter in the response is active (running), proceed to Step 3.
If the value of the Active parameter in the response is inactive (dead), the clock synchronization service fails. Run the following command to restart the service.
systemctl restart chronyd # Restart the Chrony service. systemctl restart ntpd # Restart the NTP service.After the service is restarted, manually clear the alert and check whether it recurs. If the alert recurs, go to Step 3.
In other cases, contact Technical Support to locate the issue.