Alert description
This alert monitors whether the service name status of each tenant in the OceanBase cluster is valid. When OCP detects that a tenant's service name status is INVALID and this state persists for a certain period, it triggers this alert.
Tenant service names are a feature introduced in OceanBase Database V4.2.4.0 to identify a tenant's database service. Service names must be globally unique. If multiple tenants (excluding those within the same primary-standby relationship) use the same service name, a conflict occurs, causing the service name status to become INVALID.
Alert principle
The following table lists the key parameters involved in the monitoring logic of this alert.
Parameter |
Value |
|---|---|
| Monitoring Metrics | ob_tenant_service_name_status |
| Monitoring Expression | max(ob_tenant_service_name_status{@LABELS}) by (@GBLABELS) |
| Metric Collection | ob_tenant_service_name_status |
| Metric Source | The OCP scheduled task CheckAllTenantServiceNameConflictTask traverses all tenants to check the MetaDB forob_tenantTableservice_nameandservice_name_statusThe data of a tenant is initially stored in system views of OceanBase Database.CDB_OB_SERVICESSynchronized. |
| Collection Cycle | 60 Seconds |
OCP uses the scheduled task TenantSchedules.syncAllTenantInfo() to synchronize tenant service name information from the CDB_OB_SERVICES view of the OceanBase cluster to MetaDB every 60 seconds. It then uses the analyzeTenantServiceNameStatus() method to check for service name conflicts and marks the status as VALID, INVALID, or DISABLED.
Another scheduled task, TenantSchedules.checkAllTenantServiceNameConflict(), traverses all tenants every 60 seconds and checks the status of each tenant with a configured service name:
If the service name is not empty and the status is INVALID, the release metric value is 1;
Otherwise, the published metric value is 0.
When the value of the monitoring metric ob_tenant_service_name_status is 1, it indicates that the tenant's service name is invalid. An alert is triggered after this condition persists for 120 seconds.
Rule information
Monitoring Metrics |
Default Threshold |
Duration |
Detection Cycle |
Elimination Cycle |
|---|---|---|---|---|
| ob_tenant_service_name_status | 1 | 120 Seconds | 60 Seconds | 5 Minutes |
Alert information
Alert Trigger Method |
Alert Level |
Scope |
|---|---|---|
| Based on monitoring metric expressions | Critical | Tenant |
Alert template
Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:tenant_name=mytenant The tenant service name is invalid.
Alert Details
- Template: The service name of tenant ${tenant_name} in OceanBase cluster ${ob_cluster_name} is invalid. The current service name status is INVALID. Please check for any conflicts and modify the tenant service name as soon as possible.
- Example: The service name of tenant mytenant in OceanBase cluster obcluster-1 is invalid. The current service name status is INVALID. Please check for any conflicts and modify the tenant service name as soon as possible.
Here, ${alarm_target} indicates the object that triggered the alert, in the format ob_cluster=xxxxxxx:tenant_name=xxxxxxx.
Impact on the system
A tenant service name in the INVALID state indicates a service name conflict, which may cause the following impacts:
Applications may be routed to the wrong tenant when connecting to the database by service name, leading to abnormal data access.
In a failover/switchover scenario, a service name conflict may prevent business traffic from correctly switching to the new primary tenant.
When OBProxy or the connection management component relies on service names for routing, connection failures or connections to the wrong instance may occur.
Possible causes
This is commonly seen in the following scenarios:
Service name conflict after primary/standby switchover: After a failover, the service name of the original primary tenant is not promptly set to INVALID, causing the new primary tenant and the original primary tenant to use the same service name. During the failover process, OCP automatically sets the original primary tenant's service name to INVALID via the InvalidTenantServiceNameTask and triggers an alert.
Manually configured duplicate service name: The administrator manually set the same Service Name in different clusters or tenants, but these tenants are not within the same primary-standby relationship.
Cross-cluster conflicts under multi-cluster management: In multiple clusters managed by OCP, tenants in different clusters use the same service name. OCP's global conflict detection marks it as INVALID.
Tenant synchronization exception: Delay or abnormality occurs in tenant information synchronization between OCP and the OceanBase cluster, leading to inaccurate service name status determination.
Solution
Confirm the tenant information involved in the alert. Obtain the cluster name (ob_cluster_name) and tenant name (tenant_name) from the OCP alert details to locate the specific tenant.
View service name conflict details. Log in to OCP, go to the details page of the corresponding tenant, and view the current tenant's Service Name and its status. Also, check if any other tenant is using the same Service Name. You can also execute the following SQL under the sys tenant of the OceanBase cluster to view service name information:
SELECT TENANT_ID, SERVICE_NAME, SERVICE_STATUS FROM CDB_OB_SERVICES;Check whether the primary/standby switchover is the cause.
For alerts triggered after a failover or switchover, verify whether the service name of the original primary tenant needs to be modified. Typically, you need to change the service name of the original primary tenant (which has been demoted to a standby tenant) to a new unique name or clear its service name.
You can modify the tenant service name by using the following SQL statement:
-- Execute under the sys tenant ALTER SYSTEM MODIFY SERVICE '<old_service_name>' TO '<new_service_name>' TENANT = '<tenant_name>';
Check for manual configuration conflicts.
For scenarios other than primary/standby switchover, check whether an administrator has manually configured the same service name on different tenants.
Modify the service name of one of the tenants to ensure that the service names of all tenants are globally unique within the OCP management scope (except for tenants within the same primary-standby relationship).
Wait for the alert to automatically clear. After you modify the service name, OCP will re-analyze the service name status in the next synchronization cycle (60 seconds). If the conflict is resolved, service_name_status will be updated to VALID and the metric value will return to 0. The alert will automatically clear within a maximum of 5 minutes after the metric returns to normal (resolve_timeout_seconds: 300).
If none of the above methods resolves the issue, contact OCP technical support for troubleshooting.
