Prerequisites
This alert applies only to OceanBase Database of a version earlier than V4.0.
Description
This alert is triggered when the number of partitions on the OBServer node exceeds the threshold, regardless of the tenant (excluding built-in tenants such as sys).
Principle
The following table describes the key parameters involved in the monitoring logic of this alert.
| Parameter | Value |
|---|---|
| Monitoring metric | ob_host_partition_count This value indicates the number of partitions on the OBServer node. An alert is triggered when the number exceeds the threshold (which is 30,000 by default). |
| Data source | SQL: select /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) QUERY_TIMEOUT(100000000) */ tenant_id, 1 as role, case when cnt is null then 0 else cnt end as cnt from (select tenant_id, count(*) as cnt from __all_virtual_partition_info where svr_ip = ? and svr_port = ? group by tenant_id) |
| Metric to be collected | partition_count |
| Monitoring expression | sum(ob_partition_num{@LABELS}) by (@GBLABELS) |
| Collection interval | 1 second |
Rules
| Monitoring metric | Default threshold | Duration | Detection interval | Elimination interval |
|---|---|---|---|---|
| ob_host_partition_count | 30000 | 0 seconds | 30 seconds | 15 minutes |
Alert information
| Alert triggering method | Alert level | Scope |
|---|---|---|
| Based on the monitoring metric expression | Critical | Server |
Alert template
- Alert overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.xxx OceanBase server partition count exceeds the threshold
- Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The number of partitions ${value} exceeds ${alarm_threshold}.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, OceanBase server partition count exceeds the threshold. The number of partitions 30001.0 exceeds 30000.0.
- Alert recovery
- Template: Alert: ${alarm_name}, OBServer partition count: ${value}
- Example: Alert: OceanBase server partition count exceeds the threshold, OBServer partition count: 28888.0
Impact on the system
- The heartbeat RPCs between replicas consume network resources.
- After the threshold is reached, it affects operations such as creating tables and adding partitions, as well as internal partition balancing.
Possible causes
Common causes include:
Excessive table creation.
High usage of partitioned tables, primarily by time, leading to a continuous increase in the number of partitions.
Procedure
You can choose an appropriate solution based on the actual situation:
If you want to delete unnecessary tenants, databases, and tables, empty the recycle bin, and reduce the number of partition replicas, you need to perform two major compactions after the operation is completed.
Find the tenant that has the most partition replicas.
-- Query the name and number of partition replicas of the top 10 tenants with the most partition replicas. SELECT t2.tenant_name, t1.replica_count FROM (SELECT tenant_id, COUNT(*) AS replica_count FROM __all_virtual_partition_info GROUP BY tenant_id ORDER BY replica_count DESC LIMIT 10) t1 JOIN (SELECT tenant_id, tenant_name FROM __all_tenant) t2 ON t1.tenant_id=t2.tenant_id ORDER BY replica_count DESC;You can run the following commands to delete data:
-- Drop a tenant. -- Please note that you can delete only tenants that are no longer in use. DROP TENANT IF EXISTS `your tenant name`; -- Drop a database. DROP DATABASE IF EXISTS `your database name`; -- Drop a table. DROP TABLE IF EXISTS `your table name`; -- Permanently delete a specified database from the recycle bin. PURGE DATABASE `object_name`; -- Permanently delete a specified table from the recycle bin. PURGE TABLE `object_name`; -- Empty the recycle bin. PURGE RECYCLEBIN; -- Initiate a major compaction. ALTER SYSTEM MAJOR FREEZE;
If you want to migrate units to another OBServer node, you must first add an OBServer node if no OBServer node is available. For more information about how to add an OBServer node, see the relevant topic in the User Guide.