ob_server_sstable_percent_over_threshold

2023-08-15 11:20:59  Updated

Description

This alert is triggered when the proportion of data disk space used for storing data files on an OBServer exceeds the threshold. Note

By default, the data disk is mounted to the /data/1 directory.

When you create an OceanBase cluster, the data disk is initialized and the size of disk space for storing data files is specified.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter Value
Metric ob_server_disk_percent
Source SQL: unknow select /*+ READ_CONSISTENCY(WEAK) */ total_size, free_size from __all_virtual_disk_stat where svr_ip = @svr_ip and svr_port = rpc_port();
Collected metric total_size and free_size
Metric expression 100 * (sum(total_size{metric_group="all_virtual_disk_stat",@LABELS}) by (@GBLABELS) - sum(free_size{metric_group="all_virtual_disk_stat",@LABELS}) by (@GBLABELS)) / sum(total_size{metric_group="all_virtual_disk_stat",@LABELS}) by (@GBLABELS)
Collection cycle 60 seconds

The value of the metric ob_server_disk_percent indicates the data disk usage of the OBServer. When this value exceeds the threshold, this alert is triggered. The default threshold is 85%.

Alert rule

Metric Default threshold (unit: %) Duration Detection cycle Elimination cycle
ob_disk_percent{app="OB"} 85 0 seconds 60 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Critical Server

Alert templates

  • Overview: ${alarm_target} ${alarm_name}

  • Details: ${alarm_target} ${alarm_name} The data disk usage is ${value}%, exceeding the threshold of ${alarm_threshold}%.

  • Overview example: ob_cluster=C1-1000:svr_ip=192.168.1.1 Excessive data disk usage of the OBServer

  • Details example: ob_cluster=C1-1000:svr_ip=192.168.1.1 Excessive data disk usage of the OBServer The data disk usage of the OBServer reaches 86.0%, exceeding the threshold of 85.0%.

Impact on the system

If the data disk space is insufficient, new data cannot be written to the data disk.

Possible causes

The application data size increases continuously.

Suggested solutions

Perform the following steps to solve the problem:

  1. Delete unwanted tenants, databases, and tables.

  2. Empty the recycle bin, and perform two cluster compaction tasks to release resources.

  3. Migrate units to another OBServer. If no other OBServer is available, add OBServers to scale out the cluster.

Run the following SQL commands to view the table storage space of a tenant:

-- Specify the tenant ID.
set @tenant_id=1001

-- Applies to OceanBase 1.x
select  b.table_name,sum(a.required_size)/1024/1024/1024 required_GB,sum(row_count) as rows from __all_meta_table a, __all_table b where a.table_id = b.table_id and a.zone = 'zone_name' and a.tenant_id = @tenant_id group by a.table_id order by required_GB desc;

-- Applies to OceanBase 2.x and 3.x
select c.database_name, b.table_name,sum(a.required_size)/1024/1024/1024  required_GB,sum(row_count)  as  rows  from  __all_virtual_meta_table  a inner join  __all_virtual_table  b on a.table_id=b.table_id inner join __all_virtual_database c on b.database_id=c.database_id where b.table_type<>5 and a.zone  =  'zone_name'  and  a.tenant_id  = @tenant_id group  by  a.table_id  order  by required_GB  desc;

Contact Us