Description
This alert is triggered when the number of partitions lacking replicas in a tenant exceeds the threshold.
Principle
| Parameter | Value |
|---|---|
| Metric | tenant_partition_replica_absent_count |
| Source | Query virtual tables. Query statement: SELECT /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) QUERY_TIMEOUT(20000000) */ tenant.tenant_id, tenant.tenant_name, IFNULL(stat.cnt, 0) cnt FROM __all_tenant tenant LEFT JOIN (SELECT table_id>>40 AS tenant_id, COUNT(1) cnt FROM __all_virtual_election_info WHERE member_list NOT LIKE CONCAT(replica_num,'{%') AND SUBSTR(member_list, 1, 1) != '0' GROUP BY tenant_id) stat ON tenant.tenant_id=stat.tenant_id where stat.tenant_id not in (select tenant_id from __all_rootservice_job where job_type='ALTER_TENANT_LOCALITY' and job_status='INPROGRESS'). |
| Collected metric | partition_replica_absent_count |
| Metric expression | max(partition_replica_absent_count{}) by (ob_cluster_name,ob_cluster_id,tenant_name,ob_tenant_id) |
| Collection cycle | 60 seconds |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Time before clearance |
|---|---|---|---|---|
| tenant_partition_replica_absent_count > 100 | The number of partitions lacking replicas in the tenant | 100 | 20 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Tenant |
Alert templates
Overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster=TEST-1:tenant_name=t1. Partitions in the OceanBase Database tenant lack replicas.
Details
- Template: Cluster: ${ob_cluster_name}. Tenant: ${tenant_name}. Alert: ${alarm_name}.
- Example: Cluster: TEST. Tenant: t1. Alert: Partitions in the OceanBase Database tenant lack replicas.
Impact on the system
If a partition lacks replicas, the value of last_merged_version cannot increase and the major compaction fails.
Possible causes
An OBServer node is permanently offline. Consequently, replicas on the node are unavailable.
Suggested solutions
If an OBServer node is permanently offline, the corresponding partition replicas are deleted from the member list. However, the source partition is not removed from the member list. After a minor compaction is completed for the partition, you can perform data migration operations and replicate the missing replicas.
Check whether the current cluster has a partition lacking replicas:
select * from __all_virtual_replica_task;If an entry about the OBServer node in the corresponding zone is returned, you need to initiate a load balancing task for the corresponding partitions. If no load balancing tasks are running and the target version of the major compaction is for example 25, you can query the meta tables for replicas that match the
data_version! = 25condition to reduce the troubleshooting scope. We recommend that you query the__all_meta_tableor__all_virtual_meta_tabletable first. If you do not find the replicas of earlier data versions, query the meta tables at the upper level.- In OceanBase Database earlier than V2.0, meta tables are
__all_virtual_core_meta_table,__all_virtual_core_root_table,__all_root_table, and__all_meta_table. - In OceanBase Database V2.0 and later, meta tables are
__all_virtual_core_meta_table,__all_virtual_core_root_table,__all_root_table, and__all_virtual_meta_table. - Query the
__all_virtual_partition_tabletable for replicas that match thedata_version != 25condition. - On the root server, run the following command:
grep "replica not merged to version" rootservice.l.
- In OceanBase Database earlier than V2.0, meta tables are