Alert description
This alert monitors whether the enable_cgroup parameter configuration is consistent across all nodes in an OceanBase cluster. It is triggered when the values of the enable_cgroup parameter differ between OBServer nodes within the cluster.
Alert principle
Parameter |
Value |
|---|---|
| Monitoring Metrics | cluster_cgroup_value |
| Monitoring Expression | max(cluster_cgroup_value{@LABELS}) by (@GBLABELS) |
| Metric Collection | cluster_cgroup_value |
| Metric Source | OCP Proactive Detection |
| Collection Cycle | 60 Seconds |
OCP uses the scheduled task CheckClusterCgroupValueTask to perform consistency checks on the cgroup configuration for each OceanBase cluster. The specific principle is as follows:
OCP checks whether the cluster version meets the minimum requirement for the cgroup feature. If not, it skips the operation.
Query the value of the enable_cgroup parameter on all nodes in the cluster.
Perform a distinct operation on the query results to count the number of unique values.
If the number of unique values is not equal to 1 (that is, the enable_cgroup values on different nodes are not identical), set the cluster_cgroup_value metric to 1 (alert status). Otherwise, set it to 0 (normal status).
When the metric value equals 1, the trigger condition is met and an alert is generated.
Rule information
Monitoring Metrics |
Default Threshold |
Duration |
Detection Cycle |
Elimination Cycle |
|---|---|---|---|---|
| cluster_cgroup_value | 1 | 120 Seconds | 60 Seconds | 300 Seconds |
Alert information
Alert Trigger Method |
Alert Level |
Scope |
|---|---|---|
| Based on monitoring metric expressions | Warning | Cluster |
Alert template
Alert overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=Inconsistent cgroup values in the obcluster1 cluster
Alert details
- Template: The cgroup values under the OceanBase cluster ${ob_cluster_name} may not meet expectations. Please refresh the overview page of the ${ob_cluster_name} cluster and check if the cgroup values match those on the overview page. If they do not match, modify them; if they match, this alert will automatically disappear.
- Example: The cgroup values under the OceanBase cluster obcluster1 may not meet expectations. Please refresh the overview page of the obcluster1 cluster and check whether the cgroup values match those on the overview page. If they do not match, modify them; if they do match, this alert will automatically disappear.
Alert recovery
- Template: Alert: ${alarm_name}, Are the cgroup values consistent across clusters: ${value_shown}
- Example: Alert: Inconsistent cgroup values in the cluster. Are the cgroup values in the cluster consistent?: 0
Impact on the system
Inconsistent values of the enable_cgroup parameter across nodes in a cluster indicate that cgroup-based resource isolation is enabled on some nodes but not on others. This may lead to the following impacts:
Inconsistent resource isolation: Nodes with cgroup enabled can limit tenant CPU usage, whereas those without this capability do not have such a restriction, leading to inconsistent resource usage by tenants across different nodes. On nodes without cgroup, tenants may compete for CPU resources, affecting business stability.
Unpredictable performance: The performance of the same tenant may vary across different nodes, increasing the difficulty of operational troubleshooting.
Possible causes
The O&M personnel manually modified the enable_cgroup parameter for some nodes but did not apply the change uniformly across the entire cluster.
Other unknown reasons.
Solution
Confirm the alert information: Locate the specific cluster based on the cluster name (ob_cluster_name) in the alert.
View the current cgroup configuration: On the OCP console, open the overview page of the corresponding cluster to view the cluster-level cgroup configuration status. You can also query the parameter values of each node via SQL:
SHOW PARAMETERS LIKE 'enable_cgroup';Identify the nodes with inconsistent values.
Unified cgroup configuration:
Method 1 (recommended): Use the cluster parameter management feature in the OCP console to uniformly modify the enable_cgroup parameter for all nodes, ensuring consistent configurations across the cluster.
Method 2: Set the parameter globally at the cluster level using SQL commands:
ALTER SYSTEM SET enable_cgroup = 'True';or
ALTER SYSTEM SET enable_cgroup = 'False';
Refresh the cluster overview: After making the modifications, refresh the overview page of the corresponding cluster in OCP to confirm that the cgroup values are unified.
Verify the fix result: After the fix, wait for the next detection cycle (60 seconds) and observe whether the alert is automatically cleared. If the configurations are consistent, the alert will be automatically cleared within the clearance cycle (300 seconds).
