Alert description
The average tenant thread usage on OceanBase Database exceeds the threshold.
Alerting principle
The following table describes the key parameters involved in the alerting monitoring logic.
| Parameter | Value |
|---|---|
| Monitoring metric | ob_cpu_percent |
| Metric source | select /*+read_consistency(weak)*/ tenant_name, tenant_id, stat_id, value from v$sysstat, __all_tenant where stat_id IN (140005, 140006) and (con_id > 1000 or con_id = 1) and __all_tenant.tenant_id = v$sysstat.con_id; The value field is assigned to the collected metric, and other fields serve as labels. |
| Collected metric | sysstat_value |
| Monitoring expression | 100 * sum(sysstat_value{metric_group="sysstat",stat_id="140006",@LABELS}) by (@GBLABELS) / sum(sysstat_value{metric_group="sysstat",stat_id="140005",@LABELS}) by (@GBLABELS) |
| Collection interval | 1 second |
The monitoring metric ob_cpu_percent indicates the average tenant thread usage on the OceanBase server. An alert is triggered when this value exceeds the threshold (default 90%).
Note
- Statistical event ID 140005: the maximum number of tenant threads available on the OceanBase server.
- Statistical event ID 140006: the number of tenant threads used on the OceanBase server.
The monitoring expression in the table calculates the ratio of the sum of values for stat_id=140006 to the sum of values for stat_id=140005, and uses this ratio as the monitoring metric value to indicate the average tenant thread usage on the OceanBase server.
Rule information
| Monitoring metric | Default threshold (unit: %) | Duration | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| ob_cpu_percent | 90 | 60 seconds | 60 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the monitoring metric | Critical | Server |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.xxx OceanBase server tenant thread average usage exceeds the limit
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: OceanBase tenant thread average usage is ${value_shown}%, which exceeds ${alarm_threshold} %.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: OceanBase tenant thread average usage is 91.0 %, which exceeds 90.0 %.
Alert recovery
- Template: Alert: ${alarm_name}, OceanBase server tenant thread average usage: ${value_shown}
- Example: Alert: OceanBase server tenant thread average usage exceeds the limit, OceanBase server tenant thread average usage: 89 %
Impact on the system
A high average tenant thread usage rate leads to a decrease in system throughput and an increase in request latency.
If the average tenant thread usage rate is only temporarily high, it generally does not cause significant issues. However, if it remains consistently high, it is necessary to address the problem.
Possible causes
This can occur during the execution of complex SQL queries.
Solution
First, check if tenant_cpu_percent_over_threshold OceanBase tenant thread usage exceeds the threshold has occurred.
If it has, follow the solution outlined in tenant_cpu_percent_over_threshold OceanBase tenant thread usage exceeds the threshold.
If it has not, proceed to the next step.
It is possible that multiple tenants are experiencing increased load simultaneously, leading to load accumulation on the OBServer node and triggering the alert.
To reduce the thread usage rate caused by a surge in traffic affecting multiple tenants, you can take the following actions:
Perform emergency scaling for the cluster.
Add an OBServer node to the OceanBase cluster. For more information, see Add an OBServer node.
Limit the traffic of the OceanBase cluster. For more information, see Limit the traffic of an OceanBase cluster.