Description
In OceanBase Database earlier than V4.0.0, the time consumed for an election without a leader is closely related to the number of partitions. Therefore, the number of partitions on a single OBServer node must be monitored. If a partition of a partitioned table in a tenant is leaderless, this partitioned table is unavailable until the election without a leader is completed on the partition.
This alert is triggered when the number of leaderless partitions in a tenant exceeds the threshold.
Principle
| Parameter | Value |
|---|---|
| Metric | ob_tenant_partition_leader_absent |
| Source | Query virtual tables. Query statement: SELECT /*+ MONITOR_AGENT QUERY_TIMEOUT(20000000) */ tenant.tenant_id, tenant.tenant_name, IFNULL(stat.cnt, 0) cnt FROM __all_tenant tenant LEFT JOIN (SELECT /*+QUERY_TIMEOUT(20000000)*/ tenant.tenant_id, count(distinct a.table_id,a.partition_id) cnt FROM v$ob_cluster cluster, __all_tenant tenant JOIN __all_virtual_meta_table a ON a.tenant_id=tenant.tenant_id LEFT JOIN __all_virtual_meta_table b ON a.table_id=b.table_id AND a.partition_id=b.partition_id WHERE (cluster.cluster_role='PRIMARY' and b.role=1 or cluster.cluster_role='STANDBY' and b.role=3) and b.tenant_id IS NULL GROUP BY tenant.tenant_id) stat ON stat.tenant_id=tenant.tenant_id. |
| Collected metric | partition_leader_absent_count |
| Metric expression | max(partition_leader_absent_count{}) by (ob_cluster_name,ob_cluster_id,tenant_name,ob_tenant_id) |
| Collection cycle | 60 seconds |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Time before clearance |
|---|---|---|---|---|
| tenant_partition_leader_absent_count > 100 | The number of leaderless partitions in the tenant | 0 | 20 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Tenant |
Alert templates
Overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster=TEST-1:tenant_name=t1. Partitions in the OceanBase Database tenant lack replicas.
Details
- Template: Cluster: ${ob_cluster_name}. Tenant: ${tenant_name}. Alert: ${alarm_name}.
- Example: Cluster: TEST. Tenant: t1. Alert: Partitions in the OceanBase Database tenant lack replicas..
Impact on the system
- If the __all_core_table table of the sys tenant is leaderless, you cannot restart or log on to the OBServer nodes.
- The location cache does not refresh after a read-only replica is switched as the leader replica. SQL requests and transactions return the error 4038 and time out.
Possible causes
- The memory usage of MemStore exceeds the threshold. If the memory usage of MemStore exceeds the threshold, clogs cannot be replayed, which blocks the clog reconfirmation. As a result, the
__all_core_tabletable remains leaderless. - The election fails due to a great clock offset. In OceanBase Database V4.x, a clock offset greater than 2 seconds causes an election failure.
- . The partition replicas in the tenant are not in the majority due to exceptions of OBServer nodes, which results in an election failure.
- The takeover of the clog leader times out. For more information, see Lack of a leader due to clog leader takeover timeout.
- Other reasons. For more information, see Information about leaderless partitions in the OBServer log.
Suggested solutions
- Increase the value of the memstore_limit_percentage parameter, which specifies the maximum memory usage of MemStore.
- Correct the clock offset. For more information, see Configure the clock source.
- Contact OceanBase Technical Support.