Description
This alert is triggered when it takes over three hours to execute a task in an OceanBase Database tenant. The following task types are supported:
- MIGRATE_UNIT: a unit migration task.
- ALTER_TENANT_LOCALITY: a task to modify the locality of a tenant.
- ROLLBACK_ALTER_TENANT_LOCALITY: a task to roll back the locality of a tenant.
- SHRINK_RESOURCE_TENANT_UNIT_NUM: a task to scale in the number of units of a tenant.
Principle
| Parameter |
Value |
| Metric |
ob_tenant_task_max_duration_seconds |
| Source |
Internal views. SQL statements for metric collection: In OceanBase Database of a version earlier than V4.0: SELECT /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) */ tenant_id, task_type, max(TIMESTAMPDIFF(SECOND, sys_task_status.start_time, CURRENT_TIMESTAMP ) ) max_sys_task_duration_seconds FROM __all_virtual_sys_task_status sys_task_status WHERE svr_ip = ? AND svr_port = ? GROUP BY tenant_id,task_typeIn OceanBase Database V4.0 or later: SELECT /* MONITOR_AGENT */ tenant_id, job_type as task_type, TIMESTAMPDIFF(SECOND, START_TIME, CURRENT_TIMESTAMP) as max_sys_task_duration_seconds, rs_svr_ip as svr_ip FROM DBA_OB_TENANT_JOBS WHERE job_status='INPROGRESS' AND rs_svr_ip=? and rs_svr_port=? UNION SELECT tenant_id, job_type as task_type, TIMESTAMPDIFF(SECOND, START_TIME, CURRENT_TIMESTAMP) as max_sys_task_duration_seconds, rs_svr_ip as svr_ip FROM DBA_OB_UNIT_JOBS WHERE tenant_id IS NOT NULL AND job_status='INPROGRESS' AND rs_svr_ip=? and rs_svr_port=? |
| Collected metric |
ob_tenant_task_max_duration_seconds |
| Metric expression |
max(ob_tenant_task_max_duration_seconds{@LABELS}) by (@GBLABELS) |
| Collection cycle |
1 minute |
Alert rule
| Metric expression |
Metric description |
Default threshold (unit: s) |
Detection cycle |
Time before clearance |
| ob_tenant_task_max_duration_seconds > 10800 |
The timeout duration for the execution of a task in an OceanBase Database tenant. |
10800 (3 hours) |
60 seconds |
5 minutes |
| Trigger method |
Alert level |
Scope |
| Based on the expression of the metric |
Critical |
Tenant |
Alert templates
- Alert overview: ${alarm_target} ${alarm_name}
- Alert details: Cluster: ${ob_cluster_name}. Tenant: ${tenant_name}. Host: ${host} (zone: ${obzone}). Alert: A system task in the OceanBase Database tenant has timed out. Task Type: ${task_type}; Execution Time: ${value_shown}, which exceeds the threshold of ${alarm_threshold} seconds.
- Overview example: ob_cluster=C1-1000:tenant_name=T1. A task in the OceanBase Database tenant timed out.
- Details example: Cluster: C1. Tenant: T1. Host: xxx.xxx.xxx.xxx (zone: Z1). Alert: A system task in the OceanBase Database tenant has timed out. Task Type: MIGRATE_UNIT. Execution Time: 4 hours, which exceeds the threshold of 10800 seconds.
Impact on the system
- A prolonged task execution period affects task scheduling in OceanBase Database, which reduces the task processing throughput.
- Prolonged occupation of resources affects the system stability.
- If the task fails, for example, the migration of units fails, unbalanced unit distribution is caused and fragmented resources cannot be reused.
Possible causes
None.
Solutions
- To migrate and modify units, you must ensure that the resources of the target server meet the requirements. Otherwise, the operation may fail.
- Check the
observer.log.wf file for ERROR logs and submit the identified ERROR logs to OceanBase Technical Support for troubleshooting.