Description
This alert is triggered when the latency for synchronizing the clogs of an OceanBase Database tenant to a read-only replica exceeds the threshold. The default threshold is 10 seconds. Note
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | max_ob_clog_sync_delay_seconds{replica_type="16"} Note The latency for synchronizing clogs of a tenant to a full-featured replica. If the value exceeds the threshold, which is 10 seconds by default, an alert is triggered. In most cases, the latency is within 200 milliseconds. |
| Source | sql select /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) */ __all_tenant.tenant_id, __all_tenant.tenant_name, clog_stat.replica_type, clog_stat.max_clog_sync_delay_seconds from (select table_id>>40 tenant_id, replica_type, max(next_replay_ts_delta) / 1000000 as max_clog_sync_delay_seconds from __all_virtual_clog_stat where svr_ip = ? and svr_port = ? group by tenant_id, replica_type having max_clog_sync_delay_seconds<18446744073709) clog_stat left join __all_tenant on clog_stat.tenant_id=__all_tenant.tenant_id; Note The value of clog_stat.max_clog_sync_delay_seconds is assigned to the collected metric, and other values are used as labels. Question marks (?) in the preceding SQL statement are variables. You need to specify them when executing the SQL statement. |
| Collected metric | ob_clog_max_sync_delay_seconds |
| Metric expression | max(ob_clog_max_sync_delay_seconds{@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 minute |
Alert rule
| Metric | Default threshold (unit: s) | Duration | Alert cycle | Elimination cycle |
|---|---|---|---|---|
| max_ob_clog_sync_delay_seconds | 10 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Tenant |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: cluster: ${ob_cluster}, tenant: ${tenant_name}, host: ${svr_ip} (zone: ${obzone}), alert: The latency for synchronizing the clogs of an OceanBase Database tenant to a read-only replica is ${value} seconds, exceeding the threshold of ${alarm_threshold} seconds.
Overview example: ob_cluster=obcluster1:tenant_name=tenant1 Excessive latency for synchronizing the clogs of an OceanBase Database tenant to a read-only replica {ob_cluster=obcluster1:tenant_name=tenant1}
Details example: cluster: obcluter1, tenant: tenant1, host: 192.168.1.1 (zone: zone1), alert: The latency for synchronizing the clogs of an OceanBase Database tenant to a read-only replica is 20 seconds, exceeding the threshold of 10 seconds.
Impact on the system
OceanBase supports multi-site high availability deployment. The network latency between replicas may range from tens to hundreds of milliseconds. High replica latency will:
Have an impact on selecting the primary zone of the OceanBase cluster.
Reduce the throughput of the OceanBase cluster.
Possible causes
Network faults occurred on the host of the OBServer
The OBServer host is overloaded. This increases the response time of the OBServer.
Suggested solutions
Locate the faulty cluster, tenant, and host.
Check whether the alert event contains any alert related to the host. If such an alert exists, troubleshoot the alert based on a related topic. Then, check whether the alert is cleared. For example, the following alerts may be reported:
Check the host network and O&M metrics such as load, to determine the cause of the issue and troubleshoot the issue.
For more information about host network troubleshooting, see Network troubleshooting.
View O&M metrics such as load.
Log on to the OceanBase Cloud Platform (OCP) console.
In the left-side navigation pane, click Hosts .
Find the host where an error is reported, and click the host IP address to go to the host details page.
Click the Monitoring tab.
You can check whether the host is overloaded based on the monitoring trend charts of Linux system load and CPU utilization .
If the host is steadily overloaded, you can add an OBServer for the OceanBase cluster to balance the load of the current OBServer.
If the host is abruptly overloaded, you can log on to the host and run the top command to find the process that occupies excessive CPU resources.
If the process is an OBServer process:
Go to the Performance Monitoring page of the cluster, analyze the trend chart and find the tenant with a large number of business requests. Then, throttle frequently executed SQL statements, which can be obtained from SQL Diagnosis > TOPSQL , for the tenant. For more information about throttling, see Apply throttling to an OceanBase cluster.
If the process is not an OBServer process:
Check whether other processes are required, and clear those that are not required.
Wait for five minutes and check whether the alert is automatically cleared.
Contact technical support for troubleshooting if the alert is not eliminated after all the preceding measures are taken.