host_ntp_offset_too_large

2024-07-30 09:31:29  Updated

Description

This alert is triggered when the clock offset between the OCP server and the clock source is greater than 100 ms.

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Stopped Server

Alert rule

Metric Default threshold Source Duration Detection interval Time before clearance
host_ntp_offset_milliseconds 100 N/A 0 60s 5 min

Alert templates

  • Overview: ${alarm_target} ${alarm_name}

  • Details: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The clock offset between the server and the clock source is too large. The clock offset (${value} ms) exceeds the threshold (${alarm_threshold} ms).

  • Sample overview: svr_ip=xxx.xxx.xxx.xxx. The clock offset between the server and the clock source is too large.

  • Sample details: Cluster: obCluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: The clock offset between the server and the clock source is too large. The clock offset (0.041 ms) exceeds the threshold (0.001 ms).

${alarm_target} indicates the object that generates the alert, in the svr_ip=xx.xx.xx.xx format.

Impact on the system

The distributed deployment of OceanBase depends on the control of clock offset within the specified range. If the clock offset is out of the range, OBServers cannot function properly.

Possible causes

Possible causes of this alert:

  • The clock source (NTP server) has encountered an error.

  • A network error has occurred between the OCP server and the clock source.

  • The clock synchronization service has encountered an error.

Solution

  1. Check whether the clock source is running properly.

    Check whether the clock source server is running properly.

    • If yes, proceed to Step 2.

    • If no, troubleshoot and fix the problem of the clock source server. After it runs properly, manually clear the alert and check whether the alert recurs.

  2. Check whether the network between the OCP server and the clock source is connected properly.

    • If yes, proceed to Step 3.

    • If no, reconnect the network, manually clear the alert, and check whether it recurs.

  3. Check whether the clock synchronization service is running properly.

    If the clock synchronization service on the OCP server fails, the host_ntp_service_not_exist alert will also be triggered. In this case, perform the following operations:

    1. Remove the host_ntp_service_not_exist alert. For more information, see host_ntp_service_not_exist.

    2. Manually clear the host_ntp_offset_too_large alert and check whether it recurs.

      • If yes, proceed to Step 4.

      • If no, the issue is fixed.

  4. Manually trigger clock synchronization on the OCP server to ensure that the clock offset is within the expected range.

    Run one of the following commands based on the clock synchronization service installed on the OCP server.

    # When the NTP service is in use, run the following command for synchronization (xxx.xxx.xxx.xxx indicates the IP address of the clock source that must be specified): 
    ntpdate xxx.xxx.xxx.xxx
    
    
    # When the Chrony service is in use, make sure that the clock source has been configured in the /etc/chrony.conf file and run the following command for synchronization: 
    chronyc -a makestep
    

Contact Us