Description
OCP-Agent writes diagnostic data to monitordb (the monitoring database of OCP). The diagnostic data is used for autonomous services, and a large amount of data is involved. Therefore, it is recommended to write data to tables directly. If OCP-Agent cannot connect to monitordb, an alert is generated.
Principle
| Parameter | Value |
|---|---|
| Monitoring metric | monitordb_connectable |
| Source of the metric | Connect to monitordb based on the connection string in the configuration file of OCP-Agent and execute select 1 to check whether you can connect to monitordb. |
| Metric collection | oceanbase_connectivity |
| Monitoring expression | min(oceanbase_connectivity{target="monitordb",@LABELS}) by (@GBLABELS) |
| Metric collection cycle | 5 seconds |
Rule information
| Monitoring expression | Description of the monitoring metric | Default threshold | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| monitordb_connectable == 0 | Whether you can connect to monitordb | 0 | 10 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Monitoring expression | Severe | Host |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster=Test-1:host=xxx.xxx.xxx.xxx OCP-Agent cannot connect to monitordb
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}.
- Example: Cluster: Test, Host: xxx.xxx.xxx.xxx, Alert: OCP-Agent cannot connect to monitordb.
Alert recovery
- Template: Alert: ${alarm_name}, OCP monitordb connection status: ${recover_value}
- Example: Alert: OCP-Agent cannot connect to monitordb, OCP monitordb connection status: 0
Impact on the system
OCP-Agent connects to monitordb to directly write diagnostic data that is relied on by autonomous services, such as SQL audit data, plan cache data, and transaction data. If OCP-Agent cannot connect to monitordb, it cannot obtain diagnostic data, and autonomous services are affected. When a system failure occurs, the root cause cannot be located.
Possible causes
- The account or password provided to OCP-Agent is incorrect, and OCP-Agent cannot access monitordb.
- The network between OCP-Agent and monitordb is disconnected. For example, the network is restricted by the iptables rules.
Solution
Connect to OCP's monitordb on the host where OCP-Agent is located to check whether you can access it. The following reasons may exist:
Account or password error
An error similar to
ERROR 1045 (42000): Access denied for user 'root'@'xxx.xxx.xxx.xxx' (using password: YES)is returned. In this case, check whether the provided account and password are correct.If the account or password is incorrect, you can run the following SQL statement in the command-line mode to modify the relevant system parameters in the
config_propertiestable. The value of value must be enclosed in single quotation marks:update config_properties set value='' where `key`=''The relevant system parameters are as follows:
ocp.monitordb.hostocp.monitordb.portocp.monitordb.databaseocp.monitordb.usernameocp.monitordb.password
Network disconnection
The system may fail to access the corresponding domain name or the command may be stuck after execution, and the result cannot be obtained. In this case, check the network and ensure that the network is connected.