ob_tenant_partition_replica_absent

2025-03-26 07:47:21  Updated

Description

This alert is triggered when the number of partitions lacking replicas in a tenant exceeds the threshold.

Principle

Parameter Value
Metric tenant_partition_replica_absent_count
Source Query virtual tables. Query statement: SELECT /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) QUERY_TIMEOUT(20000000) */ tenant.tenant_id, tenant.tenant_name, IFNULL(stat.cnt, 0) cnt FROM __all_tenant tenant LEFT JOIN (SELECT table_id>>40 AS tenant_id, COUNT(1) cnt FROM __all_virtual_election_info WHERE member_list NOT LIKE CONCAT(replica_num,'{%') AND SUBSTR(member_list, 1, 1) != '0' GROUP BY tenant_id) stat ON tenant.tenant_id=stat.tenant_id where stat.tenant_id not in (select tenant_id from __all_rootservice_job where job_type='ALTER_TENANT_LOCALITY' and job_status='INPROGRESS').
Collected metric partition_replica_absent_count
Metric expression max(partition_replica_absent_count{}) by (ob_cluster_name,ob_cluster_id,tenant_name,ob_tenant_id)
Collection cycle 60 seconds

Alert rule

Metric expression Metric description Default threshold Detection cycle Time before clearance
tenant_partition_replica_absent_count > 100 The number of partitions lacking replicas in the tenant 100 20 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Metric expression Critical Tenant

Alert templates

  • Overview

    • Template: ${alarm_target} ${alarm_name}
    • Example: alarm_template_id=0:ob_cluster=TEST-1:tenant_name=t1. Partitions in the OceanBase Database tenant lack replicas.
  • Details

    • Template: Cluster: ${ob_cluster_name}. Tenant: ${tenant_name}. Alert: ${alarm_name}.
    • Example: Cluster: TEST. Tenant: t1. Alert: Partitions in the OceanBase Database tenant lack replicas.

Impact on the system

If a partition lacks replicas, the value of last_merged_version cannot increase and the major compaction fails.

Possible causes

An OBServer node is permanently offline. Consequently, replicas on the node are unavailable.

Suggested solutions

  1. If an OBServer node is permanently offline, the corresponding partition replicas are deleted from the member list. However, the source partition is not removed from the member list. After a minor compaction is completed for the partition, you can perform data migration operations and replicate the missing replicas.

  2. Check whether the current cluster has a partition lacking replicas:

    select * from __all_virtual_replica_task;
    

    If an entry about the OBServer node in the corresponding zone is returned, you need to initiate a load balancing task for the corresponding partitions. If no load balancing tasks are running and the target version of the major compaction is for example 25, you can query the meta tables for replicas that match the data_version! = 25 condition to reduce the troubleshooting scope. We recommend that you query the __all_meta_table or __all_virtual_meta_table table first. If you do not find the replicas of earlier data versions, query the meta tables at the upper level.

    1. In OceanBase Database earlier than V2.0, meta tables are __all_virtual_core_meta_table, __all_virtual_core_root_table, __all_root_table, and __all_meta_table.
    2. In OceanBase Database V2.0 and later, meta tables are __all_virtual_core_meta_table, __all_virtual_core_root_table, __all_root_table, and __all_virtual_meta_table.
    3. Query the __all_virtual_partition_table table for replicas that match the data_version != 25 condition.
    4. On the root server, run the following command: grep "replica not merged to version" rootservice.l.

Contact Us