Alert description
This alert is triggered when the OceanBase tenant thread usage exceeds the specified threshold.
OceanBase tenant thread usage = (Number of tenant threads actually in use) / (Sum of tenant thread limits across all units in the tenant's primary zone).
Alert principle
The following table lists the key parameters involved in the monitoring logic of this alert.
| Parameter | Value |
|---|---|
| Monitoring metric | ob_tenant_thread_percent
NoteThis metric indicates the thread usage of tenants in the cluster. An alert is triggered when this value exceeds the threshold (default is 95%). |
| Metric source | SQL: select /*+read_consistency(weak)*/ tenant_name, tenant_id, stat_id, value from v$sysstat, __all_tenant where stat_id IN (140006, 140005) and (con_id > 1000 or con_id = 1) and __all_tenant.tenant_id = v$sysstat.con_id; sysstat_value takes the value from the value field, and other fields serve as labels. |
| Metric to be collected | sysstat_value |
| Monitoring expression | 100 * sum(sysstat_value{metric_group="sysstat",stat_id="140006",@LABELS}) by (@GBLABELS) / sum(sysstat_value{metric_group="sysstat",stat_id="140005",@LABELS}) by (@GBLABELS)
|
| Collection interval | 1 second |
Alert Information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the monitoring metric expression | Warning | Tenant |
Rule Information
| Monitoring metric | Default threshold (unit: %) | Duration | Detection interval | Elimination interval |
|---|---|---|---|---|
| ob_tenant_thread_percent{app="OB"} | 95 | 0 seconds | 60 seconds | 5 minutes |
Alert Template
Overview
- Template: ${alarm_target} ${alarm_name}
- Sample: ob_cluster=obcluster-1:tenant_name=tenant-1:svr_ip=xxx.xxx.xxx.xxx OceanBase tenant thread usage exceeds the threshold
Alert Details
- Template: Cluster: ${ob_cluster_name}, Tenant: ${tenant_name}, Alert: ${alarm_name}. Tenant thread usage is ${value_shown}%, which exceeds the threshold of ${alarm_threshold} %.
- Sample: Cluster: obcluster-1, Tenant: m****, Alert: OceanBase tenant thread usage exceeds the threshold. Tenant thread usage is 96.0 %, which exceeds the threshold of 95.0 %.
Clear Alert
- Template: Alert: ${alarm_name}, OceanBase tenant thread usage: ${value_shown}
- Sample: Alert: OceanBase tenant thread usage exceeds the threshold, OceanBase tenant thread usage: 93 %
Impact on the system
None.
Possible causes
A large query or hot data was generated.
The tenant resource plan was smaller than the actual resources, and an unexpected hot scenario occurred.
Procedure
Check whether the load is normal.
Log in to the OCP console and go to the Performance Monitoring page of the tenant. On the Performance and SQL tab, view the Thread Usage of Tenant line chart. If the tenant thread usage suddenly increased at the alert time point compared with the past 1-7 days, the load is abnormal.
If yes, the load is abnormal.
If no, the load is normal.
If the normal traffic caused the load to be too high, increase the tenant specification by referring to Manage tenant specifications to allocate more tenant thread resources.
The high load may be caused by a large query or hot traffic.
You can handle it based on the following scenarios.
A large query exists in the SQL execution.
On the TopSQL tab of the SQL Diagnostics page in the OCP console, view whether there is an SQL statement that uses a high amount of CPU resources.
If yes, you can optimize the SQL statement.
If no, the cause is not this one.
SlowSQL causes the high load.
On the SlowSQL tab of the SQL Diagnostics page in the OCP console, view the specific diagnostic results and analyze whether it may cause the tenant thread load to be too high.
If yes, you can optimize the SQL statement.
The hot data caused by hot rows leads to a high CPU usage of a tenant node. If this is the case, you can limit the traffic of the tenant business by referring to Limit the traffic of OceanBase cluster.