Alert description
Note
This alert takes effect only for OceanBase clusters of version V4.0.0.0 or later.This alert is used to identify abnormal log stream (LS) leader scenarios within a tenant: when the number of leaders for a log stream is not 1 (i.e., it has no leader or multiple leaders), the log stream is considered abnormal and an alert is triggered.
Alert principle
The following table lists the key parameters involved in the monitoring logic of this alert.
Parameter |
Value |
|---|---|
| Monitoring Metrics | ob_tenant_log_stream_no_leader: Indicates whether there are leaderless log streams in the tenant. An alert is triggered when this value is 1. |
| Monitoring Expression | ob_tenant_log_stream_no_leader{@LABELS} |
| Metric Collection | ob_tenant_log_stream_no_leader |
| Metric Source | OCP scheduled task inspection (non-Agent collection), collection SQL:SELECT tenant_id FROM __all_virtual_log_stat WHERE ls_id = 1 AND role = 'LEADER'SELECT a.tenant_id, a.ls_id, COUNT(*) as leader_count FROM CDB_OB_LS a INNER JOIN CDB_OB_LS_LOCATIONS b ON a.tenant_id = b.tenant_id AND a.ls_id = b.ls_id AND b.role = 'LEADER' WHERE a.status NOT IN ('creating', 'create_abort') GROUP BY a.tenant_id, a.ls_id |
| Detection Cycle | 60 Seconds |
Rule information
Monitoring Metrics |
Default Threshold |
Duration |
Detection Cycle |
Elimination Cycle |
|---|---|---|---|---|
| ob_tenant_log_stream_no_leader | 1 | 180 Seconds | 60 Seconds | 5 Minutes |
Alert information
Alert Trigger Method |
Alert Level |
Scope |
|---|---|---|
| Based on monitoring metric expression | Critical | Tenant |
Alert template
Alert overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=1:ob_cluster=xxx:tenant_name=tenant_a:ls_id=1001 OceanBase tenant log stream owner absent
Alert Details
- Template: Cluster: ${ob_cluster_name}, Tenant: ${tenant_name}, Log Stream: ${ls_id}, Alert: ${alarm_name}.
- Example: cluster: ob_cluster_x, tenant: tenant_a, log stream: 1001, alert: OceanBase tenant log stream leader absent.
Alert recovery
- Template: Alert: ${alarm_name}, Tenant Log Stream Owner Absent: ${value_shown}
- Example: Alert: OceanBase tenant log stream leader absent, Tenant log stream leader absent: 0
Impact on the system
Impact on the OceanBase tenant: An orphaned log stream affects its read/write availability and fault recovery capability.
Impact on OCP: OCP will continuously generate alerts at the log stream level, prompting operations personnel to restore the leader election status as soon as possible.
Impact on business: Request failures, increased latency, or partial business unavailability may occur.
Possible causes
The majority of replicas are unavailable, preventing the completion of leader election.
Replica disconnection due to node downtime, network partitioning, or data center failure.
Primary election failed due to abnormal replica roles or inconsistent log stream metadata.
Solution
First, check the status of the tenant and node where the abnormal log stream is located, and restore the failed node/network connectivity.
Check whether the replica distribution and majority requirement are met to confirm that the log stream meets the leader election conditions.
After observing that the ob_tenant_log_stream_no_leader metric returns to 0 (same tag dimension), confirm that the alert is automatically resolved.
If the issue persists, perform further repairs based on the log stream and replica diagnostic results (such as replica reconstruction, migration, or replacement of the faulty node).
