ob_host_partition_count_over_threshold|V4.0.0|OceanBase Cloud Platform| docs|Distributed Database

ob_host_partition_count_over_threshold

Last Updated：2025-07-02 07:30:55 Updated

Description

This alert is triggered when the number of partitions on the OBServer exceeds the threshold. All partitions are counted, including the partitions of built-in tenants such as the sys tenant.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter	Value
Metric	ob_host_partition_count Note The value indicates the number of OBServer partitions. By default, the alert is triggered when the number of partitions exceeds 30,000.
Source	SQL: `select /+ MONITOR_AGENT READ_CONSISTENCY(WEAK) QUERY_TIMEOUT(100000000) / tenant_id, 1 as role, case when cnt is null then 0 else cnt end as cnt from (select tenant_id, count(*) as cnt from __all_virtual_partition_info where svr_ip = ? and svr_port = ? group by tenant_id)`
Collected metric	partition_count
Metric expression	sum(ob_partition_num{@LABELS}) by (@GBLABELS)
Collection cycle	1 second

Alert rule

Metric	Default threshold	Duration	Detection cycle	Time before clearance
ob_host_partition_count	30000	0 seconds	30 seconds	15 minutes

Alert information

Trigger method	Alert level	Scope
Metric expression	Critical	Server

Alert templates

Overview: ${alarm_target} ${alarm_name}
Details: Cluster:${ob_cluster_name}, Host:${host}, Alert:${alarm_name}. The partition number is ${value}, exceeding the threshold of ${alarm_threshold}.
Overview example: ob_cluster=C1-1000:svr_ip=192.168.1.1. The partition number of the OBServer exceeds the threshold.
Details example: Cluster:ob_cluster=C1-1000, Host:192.168.1.1, Alert:The partition number of the OBServer exceeds the threshold, The partition number is 30001.0, exceeding the threshold of 30000.0.

Impact on the system

The heartbeat Remote Procedure Calls (RPCs) between replicas consume network resources.
When the partition number exceeds the threshold, users cannot create tables or add partitions, and the internal partition balance will be affected.

Possible causes

This problem is commonly found in the following scenarios:

You have created a large number of tables.
The application uses many partition tables. The tables are partitioned by time, causing the partition number to constantly increase.

Suggested solutions

You can select one of the following solutions based on the actual situation.

Delete unwanted tenants, databases, and tables, empty the recycle bin, and perform two rounds of major compaction to reduce the number of partition replicas.

Find the tenant with the most partition replicas.

-- Run the following command to query the top 10 tenants with the most replicas.
SELECT t2.tenant_name, t1.replica_count
FROM 
 (SELECT tenant_id, COUNT(*) AS replica_count
  FROM __all_virtual_partition_info
  GROUP BY tenant_id
  ORDER BY replica_count DESC
  LIMIT 10) t1
JOIN
 (SELECT tenant_id, tenant_name
  FROM __all_tenant) t2
ON t1.tenant_id=t2.tenant_id
ORDER BY replica_count DESC;

Run the following commands to delete data:

-- Drop a tenant.
-- Make sure that the tenants to be dropped are unwanted. 
DROP TENANT IF EXISTS `your tenant name`;

-- Drop the database.
DROP DATABASE IF EXISTS `your database name`;

-- Drop a table.
DROP TABLE IF EXISTS `your table name`;

-- Purge the specified database from the recycle bin.
PURGE DATABASE `object_name`;

-- Purge the specified table from the recycle bin.
PURGE TABLE `object_name`;

-- Purge the entire recycle bin.
PURGE RECYCLEBIN;

-- Start major compaction.
ALTER SYSTEM MAJOR FREEZE;

Move units to another OBServer. If no other OBServer is available, add one to scale out the cluster. For more information about cluster scale-out, see Add an OBServer in User Guide.

Community Edition

Enterprise Edition