Description
This alert is triggered when the clock offset between the server managed by OceanBase Cloud Platform (OCP) and the clock source is greater than 100 ms.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | host_ntp_offset_milliseconds Note This metric indicates the time difference between the server and the clock source. The alert is triggered when the time difference is larger than the threshold or the clock synchronization service is unavailable. The default threshold is 100 ms. |
| Source | unknow chronyc sources | grep -e '^\^\*' | sed 's/.*\[\(.*\)\].*/\1/g' ntpq -np 127.0.0.1 | grep -v '127.127.1.0' |/bin/grep -e '^*' ntpdc -np 127.0.0.1 |grep -v '127.127.1.0' |grep -e '^*' Note The metric source of this alert is special. By default, the OCP-Agent uses the chronyd program to obtain the clock offset data. When chronyd does not exist on the host, the ntpd program is used instead. The ntpq command has the priority over the ntpdc command. The clock offset data is not collected if the host startup time is less than 10 minutes. |
| Collected metric | ntp_offset |
| Metric expression | max(ntp_offset{app="HOST",@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
Alert rule
| Metric | Default threshold (unit: ms) | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| host_ntp_offset_milliseconds | 100 | 0 | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Stopped | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. The clock offset between the server and the clock source is too large. Therefore, the clock offset (${value} ms) exceeds the threshold (${alarm_threshold} ms).
Overview example: svr_ip=192.168.0.1. The clock offset between the server and the clock source is too large.
Details example: svr_ip=192.168.0.1. The clock offset between the server and the clock source is too large. The clock offset (0.041 ms) exceeds the threshold (0.001 ms).
${alarm_target} indicates the object that generated the alert, in the svr_ip=xx.xx.xx.xx format.
Impact on the system
The distributed deployment of OceanBase depends on the control of clock offset within the specified range. If the clock offset is out of the range, OBServers cannot function properly.
Possible causes
Possible causes of this alert:
The clock source (NTP server) has encountered an error.
A network error has occurred between the server managed by OCP and the clock source.
The clock synchronization service has encountered an error.
Suggested solutions
Check the clock of the OCP-Server.
Check whether the alert is reported multiple times in the OCP console.
If no, go to Step 2.
Otherwise, it may be due to the exceptions of the OCP-Server.
We recommend that you bring the OCP-Server back to normal before adjusting the OCP-Server clock by following the instructions in Step 4. After that, check whether the alert recurs 5 minutes later.
Check the network connection between the OCP-Server and the remote host managed by the OCP-Server.
If the connection is normal, go to Step 3.
Otherwise, bring the network connection back to normal and adjust the clock of the OCP-Server by following the instructions in Step 4. After that, check whether the alert recurs 5 minutes later.
Check whether the clock synchronization service is running properly.
If the clock synchronization service on the server managed by OCP fails, the host_ntp_service_not_exist alert will also be triggered. In this case, perform the following operations:
Remove the host_ntp_service_not_exist alert. For more information, see host_ntp_service_not_exist.
Check whether the alert recurs 5 minutes later.
If yes, proceed to Step 4.
If the alert is cleared, the issue is fixed.
If both the clock source and the clock service of the host are normal, the clock offset may be caused by the power failure of the host or other issues.
Manually trigger clock synchronization on the server managed by OCP to ensure that the clock offset is within the expected range.
Run one of the following commands based on the clock synchronization service installed on the OCP-Server.
# When the NTP service is in use, run the following command for synchronization (192.168.0.1 indicates the IP address of the clock source that must be specified): ntpdate 192.168.0.1 # When the Chrony service is in use, make sure that the clock source has been configured in the /etc/chrony.conf file and run the following command for synchronization: chronyc -a makestep