Description
This alert is triggered when the observer process stops.
The ocp_agent process monitors the uptime of the processes of components in OceanBase Database. When the uptime of a process is 0, the process is stopped. This alert indicates that the process has stopped at the moment rather than for a long time. In the latter case, other status alerts are triggered.
Relevant alerts:
- observer process: observer_process_stop
- obproxy process: obproxy_process_stop
- obproxyd.sh process: obproxyd_process_stop
- ocp_agentd process: agentd_process_stop
- ocp_mgragent process: mgragent_process_stop
- ocp_monagent process: monagent_process_stop
Principle
| Parameter | Value |
|---|---|
| Metric | observer_uptime_delta_seconds |
| Source | The difference between the current time and the process creation time displayed in the 14th column of the stat file in the /proc/[pid] directory. |
| Collected metric | process_uptime_seconds |
| Metric expression | 0 - min(delta(process_uptime_seconds{name="observer",@LABELS}[@INTERVAL])) by (@GBLABELS) |
| Collection cycle | 5 seconds |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Time before clearance |
|---|---|---|---|---|
| observer_uptime_delta_seconds > 0 | The negative offset value of uptime is used. A metric value greater than 0 indicates that the process has stopped. | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Warning | OBProxy |
Alert templates
Overview
- Template:
${alarm_target} ${alarm_name} - Example:
alarm_template_id=0:ob_cluster=AdminMETA-12:host=xxx.xxx.xxx.xxx. The observer process stopped.
- Template:
Details
- Template:
Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name}. The process has been running for ${value_shown}. - Example:
Cluster: AdminMETA. Host: xxx.xxx.xxx.xxx. Alert: The observer process stopped. The process has been running for 62 days 20 hours 56 seconds.
- Template:
Impact on the system
Possible results when the observer process stops:
It is expected that the observer process stops due to a system fault or intended O&M action.
If the tenant has multiple replicas, and the observer process on a server that hosts a replica stops, OceanBase Database tries to maintain the majority of the replicas and ensure high availability by migrating the replica to another server. The time consumed for migration depends on the data volume. You can try to restart the observer process. We recommend that you stop trying if the restart fails several times. Otherwise, the disk can be fully occupied by core dump files and log files.
Possible causes
The process unexpectedly exits and a core dump file is generated. In this case, you can view the core dump file in the
/data/1directory by executing the following statement:ls -l ${observer.coredump.path} --full-time | grep '.*core-observer'The memory usage exceeds the threshold and the process is killed by the operating system.
The disk is faulty.
Causes recorded in the
observer.logfile. You can search the log file for the following three keywords to view the causes:is_out_of_memstore_mem=true,right_to_die_or_duty_to_live, andon_fatal_error.Other exceptions.
Solutions
- You can restart the observer process depending on the specific situation. For example, if the observer process crashes, or the observer process stops due to full usage of memory, you can try restarting the observer process right away. Otherwise, you need to confirm the specific causes to avoid unexpected consequences.
- We recommend that you contact OceanBase Technical Support.