Common alert item management |V2.2.77|OceanBase Database| docs|Distributed Database

Common alert item management

Last Updated：2023-08-18 09:26:34 Updated

You can use the alert item management feature on OceanBase Cloud Platform (OCP) to manage common alerts.

Background

The operation interface of OCP may vary with the version. The following describes common alert management on OCP V2.4.4. For operations on other versions of OCP, see the OCP User Guide of the corresponding version.

Manage alert items

Log on to the OCP console.
In the left-side navigation pane, choose System Management > Alerts .
Click the Alert Item Configuration tab.

You can view and edit the system built-in alert items.
Click Group Management and then click Manage Alert Items . On the page that appears, you can add or delete alert items in each group.
Go back to the Alert Item Configuration tab, and click Create Alert Item . The Create Alert Item page appears.
Set the rule information for the alert item.
1. Specify Alert Item Scope.
2. Specify Matched Object to set the objects to which the alert rules apply.
3. Specify Trigger Condition.
  1. You can set Alert Rules based on the value you set for Alert Item Scope.
  2. If you set Duration to 0, alerts are immediately triggered. You can set the duration to another value to avoid false reporting of alerts caused by metric glitches.
4. Specify Detection Cycle and Elimination Cycle.
Set the basic information about the alert item.
1. Specify Alert Item and Name.
2. Specify Alert Level.
3. Set the alert summary template as prompted.
4. Set the alert details template as prompted.
  - The alert summary template is referenced by the template field of the channel, and corresponds to the alarm_summary variable.
  - The alert details template is referenced by the template field of the channel, and corresponds to the alarm_description variable.
Click OK .

Built-in alert items

The following table describes the built-in alert items. You can configure thresholds for some alert items based on the characteristics of different environments.

Alert item	Description	Source	Alert level	Recommended threshold	Cause and solution
ob_cluster_exists_inactive_server	An inactive OBServer exists in the OceanBase cluster.	Cluster	Critical	0	A node becomes inactive because the sending of the node heartbeat data timed out in the OceanBase cluster. This issue may be caused by a network exception or an operating system breakdown.
ob_cluster_merge_timeout	A major compaction timed out in the OceanBase cluster.	Cluster	Critical	1	The amount of time it takes to execute a major compaction in the OceanBase cluster exceeds the configured timeout threshold, and the major compaction status of the cluster becomes TIMEOUT. Log on to the cluster for troubleshooting.
ob_cluster_merge_error	An error occurs during a major compaction in the OceanBase cluster.	Cluster	Critical	1	An error occurs during a major compaction in the OceanBase cluster and the major compaction status of the cluster becomes ERROR. Log on to the cluster for troubleshooting.
ob_cluster_no_merge	Failed to detect a major compaction in the OceanBase cluster.	Cluster	Critical	108000	No major compaction has been performed in the OceanBase cluster during the specified period in seconds. Log on to the cluster for troubleshooting.
ob_cluster_no_frozen	Failed to detect a freeze in the OceanBase cluster.	Cluster	Critical	172800	No freeze has been performed in the OceanBase cluster during the specified period in seconds. Log on to the cluster for troubleshooting.
ob_cluster_exists_index_fail_table	A table with an indexing failure exists in the OceanBase cluster.	Cluster	Critical	0	A table with an index construction failure is found after a data compaction is performed in the OceanBase cluster. Log on to the cluster for troubleshooting.
ob_tenant500_mem_hold_percent_over_threshold	The memory usage of tenant 500 in the OceanBase cluster exceeds the limit.	Cluster	Critical	25	The memory usage of tenant 500 in the OceanBase cluster exceeds the specified threshold. This issue may be caused by a memory leakage on some nodes. Log on to the cluster for troubleshooting.
ob_zone_sstable_percent_over_threshold	The data disk usage of a zone in the OceanBase cluster exceeds the limit.	Cluster	Critical	95	The data disk usage of a zone in the OceanBase cluster exceeds the specified threshold. This issue may be caused by a high disk usage in the cluster. Log on to the cluster for troubleshooting.
ob_cluster_frozen_version_delta_over_threshold	The difference between the freeze version and the baseline version in the OceanBase cluster exceeds the specified threshold.	Cluster	Critical	1	The difference between the freeze version and the baseline version in the OceanBase cluster exceeds 1. This issue may be caused by a freeze exception in the cluster. Log on the cluster for troubleshooting.
tenant_memstore_percent_over_threshold	The memory usage of a tenant in the OceanBase cluster exceeds the limit.	Tenant	Warning	90	The memory usage of a tenant in the OceanBase cluster exceeds the specified threshold. This issue may be caused by an abnormal minor compaction or major compaction in the cluster. Log on the cluster for troubleshooting.
tenant_disk_percent_over_threshold	The data disk usage of a tenant in the OceanBase cluster exceeds the limit.	Tenant	Warning	70	The data disk usage of a tenant in the OceanBase cluster exceeds the specified threshold. This issue may be caused by a high disk usage of the tenant. Log on to the cluster for troubleshooting.
tenant_cpu_percent_over_threshold	The CPU utilization of a tenant in the OceanBase cluster exceeds the limit.	Tenant	Warning	100	The CPU utilization of a tenant in the OceanBase cluster exceeds the specified threshold. This issue may be caused by a high read and write load on the tenant. Check the monitoring data of the CPU utilization and QTPS metrics for troubleshooting.
tenant_connection_percent_over_threshold	The number of tenant connections in the OceanBase cluster exceeds the limit.	Tenant	Warning	10000	The number of tenant connections in the OceanBase cluster exceeds the specified threshold. This issue may be caused by improper use behaviors of the tenant user. Perform troubleshooting with reference to the Active_session metric, which is configured based on the number of tenant nodes. The maximum number of connections allowed for a single node in the OceanBase database is 65535.
tenant_active_memstore_percent_over_threshold	The memory occupied by the memstore of a tenant in the OceanBase cluster exceeds the limit.	Tenant	Warning	100	The memory occupied by the memstore of a tenant in the OceanBase cluster exceeds the limit. This issue may be caused by the read and write load or abnormal memory leakage on the tenant. Log on to the cluster for troubleshooting.
obagent_upgrade_failed	Upgrading the OBAgent failed.	Server	Critical	0	Upgrading the basic OCP component OBAgent failed. This issue may be caused by an improper upgrade operation. Log on to the corresponding server for troubleshooting.
ob_host_down	A server in the OceanBase database failed.	Server	Stopped	0	A server node is down. This issue may be caused by server hardware or software exceptions. Log on to the corresponding server for troubleshooting.
ob_host_tcp_retrans_percent_over_threshold	The Transmission Control Protocol (TCP) retransmission rate of a server in the OceanBase database exceeds the limit.	Server	Critical	50	The TCP retransmission rate of a server in the OceanBase database exceeds the specified threshold. This issue may be caused by a server NIC or cluster network exception. Check the NET metric and log on to the corresponding server for troubleshooting.
ob_server_sstable_percent_over_threshold	The data disk usage on a server in the OceanBase database exceeds the limit.	Server	Warning	85	The data disk usage on a server in the OceanBase database exceeds the specified threshold. This issue is related to data disk use on the server. Log on to the corresponding server for troubleshooting.
ob_host_ssd_wear_indicator_over_threshold	The SSD usage on a server in the OceanBase database exceeds the limit.	Server	Critical	95	The SSD usage on a server in the OceanBase database exceeds the specified threshold. Log on to the corresponding server for troubleshooting.
ob_tenant500_mem_hold_over_threshold	The memory occupied by tenant 500 in the OceanBase cluster exceeds the limit.	Server	Critical	50	The percentage of memory occupied by the internal tenant in the OceanBase cluster exceeds the specified threshold. This issue is related to the memory consumption of the internal tenant. Log on to the SYS tenant for troubleshooting.
ob_host_disk_readonly	The disks on a server in the OceanBase database are read-only.	Server	Critical	1	The disk status on the server changes to read-only. The disks may be in an abnormal state. Log on to the server to check the exception.
ob_host_partition_count_over_threshold	The number of partitions on a server in the OceanBase database exceeds the limit.	Server	Critical	30,000	The number of partitions on a node in the OceanBase database exceeds the specified threshold. This issue is related to the total number and distribution of partitions in the cluster. Log on to the SYS tenant for troubleshooting.
ob_host_net_send_percent_over_threshold	The data sending bandwidth on a server in the OceanBase database exceeds the limit.	Server	Warning	80	The data sending bandwidth on a server in the OceanBase database exceeds the specified threshold. This issue is related to the read and write load in the cluster and the internal status of the cluster. Check the NET metric and log on to the SYS tenant for troubleshooting.
ob_host_ntp_command_not_found	The NTP service command does not function on a server in the OceanBase database.	Server	Critical	1	The NTP service command does not function on a server in the OceanBase database. This issue may be caused because the NTP server cannot be accessed from the operating system of the server in the OceanBase database. Log on to the corresponding server for troubleshooting.
ob_host_ntp_offset_too_large	The NTP offset of a server in the OceanBase database is too large.	Server	Critical	50	The NTP offset of a server in the OceanBase database exceeds the specified threshold. This issue is related to the NTP service, NTP clock service, and network service configured on the server. Log on to the corresponding server for troubleshooting.
ob_host_net_exception	A network error occurs on a server in the OceanBase database.	Server	Stopped		A network error occurs on a server in the OceanBase database. This issue may be caused by a network exception of the server. Check the network availability for the server.
ob_host_mem_percent_over_threshold	The memory usage on a server in the OceanBase database exceeds the limit.	Server	Critical	90	The memory usage on a server in the OceanBase database exceeds the specified threshold. This issue may be caused by a memory leakage of the process. Log on to the corresponding server for troubleshooting.
ob_mem_assigned_percent_over_threshold	The percentage of memory allocated to tenants in the OceanBase cluster exceeds the limit.	Server	Warning	100	The percentage of memory allocated to tenants in the OceanBase cluster exceeds the specified threshold. This issue is related to the allocation of memory to tenants. Log on to the SYS tenant and the corresponding server for troubleshooting.
ob_host_load1_per_cpu_over_threshold	Average CPU load on a server in the OceanBase database exceeds the limit.	Server	Critical	4	This alert is triggered when the average CPU load on the server in the OceanBase database exceeds the specified threshold for five minutes. This issue is related to the read and write load on the server. Perform troubleshooting with reference to the QTPS metric.
ob_host_net_recv_percent_over_threshold	The data receiving bandwidth on a server in the OceanBase database exceeds the limit.	Server	Warning	80	The data receiving bandwidth on a server in the OceanBase database exceeds the specified threshold. This issue is related to the read and write load in the cluster and the internal status of the cluster. Log on to the SYS tenant and perform troubleshooting with reference to the NET metric.
ob_host_exists_expired_trans	A server in the OceanBase database has a suspended transaction.	Server	Warning	1	A server in the OceanBase database has a suspended transaction. This issue is related to user behaviors. Perform troubleshooting with reference to the TPS metric and the SQL monitoring metrics.
ob_host_disk_percent_over_threshold	The disk usage on a server in the OceanBase database exceeds the limit.	Server	Warning	97	The disk usage on a server in the OceanBase database exceeds the specified threshold. This issue is caused by a high disk usage level. Log on to the corresponding server for troubleshooting.
ob_cpu_percent_over_threshold	The CPU utilization in the OceanBase cluster exceeds the limit.	Server	Critical	99	This alert is triggered when the CPU utilization in the OceanBase cluster exceeds the specified threshold for five minutes. This issue is related to user behaviors. Perform troubleshooting with reference to the TPS metric and SQL monitoring metrics.
ob_cpu_assigned_percent_over_threshold	The percentage of CPU resources allocated to tenants in the OceanBase cluster exceeds the limit.	Server	Warning	100	The percentage of CPU resources allocated to tenants in the OceanBase cluster exceeds the specified threshold. This issue is related to the allocation of CPUs to tenants. Log on to the SYS tenant and the corresponding server for troubleshooting.
ob_host_cpu_percent_over_threshold	The CPU utilization on a server in the OceanBase database exceeds the limit.	Server	Critical	100	This alert is triggered when the CPU utilization on a server in the OceanBase database exceeds the specified threshold for one minute. This issue is related to the CPU use by OceanBase Database services and other processes on the server. Log on to the corresponding server for troubleshooting.
ob_cannot_connected	A server in the OceanBase database cannot be connected.	Server	Stopped	0	This alert is triggered when a server in the OceanBase database cannot be connected. Confirm this issue with the OAM team of the server.
ob_log_alarm	A log-based alert is for the OceanBase database.	Server	Warning		This alert is triggered when an ERROR level log event is found in the election, rootservice, or observer logs of the OceanBase database. This issue is related to the internal behaviors and running status of the OceanBase database. Log on to the SYS tenant for troubleshooting.
obagent_process_dead	The obagent process is unavailable.	Server	Critical		This alert is triggered when the obagent process, which is an OCP component, on a server in the OceanBase database malfunctions. This issue may be caused by abnormal running of the obagent process or an exception of the operating system. Log on to the corresponding server for troubleshooting.
obagent_dead	The obagent service is unavailable.	Server	Stopped		This alert is triggered when the obagent service, which is an OCP component, on a server in the OceanBase database is stopped. This issue may be caused by the exit or abortion of the obagent process upon exceptions. Log on to the corresponding server for troubleshooting.