ob_tenant_task_timeout|V4.3.5| docs|Distributed Database

ob_tenant_task_timeout

Last Updated：2025-03-26 07:47:21 Updated

Description

This alert is triggered when it takes over three hours to execute a task in an OceanBase Database tenant. The following task types are supported:

MIGRATE_UNIT: a unit migration task.
ALTER_TENANT_LOCALITY: a task to modify the locality of a tenant.
ROLLBACK_ALTER_TENANT_LOCALITY: a task to roll back the locality of a tenant.
SHRINK_RESOURCE_TENANT_UNIT_NUM: a task to scale in the number of units of a tenant.

Parameter	Value
Metric	ob_tenant_task_max_duration_seconds
Source	Internal views. SQL statements for metric collection: In OceanBase Database of a version earlier than V4.0: `SELECT /+ MONITOR_AGENT READ_CONSISTENCY(WEAK) / tenant_id, task_type, max(TIMESTAMPDIFF(SECOND, sys_task_status.start_time, CURRENT_TIMESTAMP ) ) max_sys_task_duration_seconds FROM __all_virtual_sys_task_status sys_task_status WHERE svr_ip = ? AND svr_port = ? GROUP BY tenant_id,task_type` In OceanBase Database V4.0 or later: SELECT /* MONITOR_AGENT */ tenant_id, job_type as task_type, TIMESTAMPDIFF(SECOND, START_TIME, CURRENT_TIMESTAMP) as max_sys_task_duration_seconds, rs_svr_ip as svr_ip FROM DBA_OB_TENANT_JOBS WHERE job_status='INPROGRESS' AND rs_svr_ip=? and rs_svr_port=? UNION SELECT tenant_id, job_type as task_type, TIMESTAMPDIFF(SECOND, START_TIME, CURRENT_TIMESTAMP) as max_sys_task_duration_seconds, rs_svr_ip as svr_ip FROM DBA_OB_UNIT_JOBS WHERE tenant_id IS NOT NULL AND job_status='INPROGRESS' AND rs_svr_ip=? and rs_svr_port=?
Collected metric	ob_tenant_task_max_duration_seconds
Metric expression	max(ob_tenant_task_max_duration_seconds{@LABELS}) by (@GBLABELS)
Collection cycle	1 minute

Metric expression	Metric description	Default threshold (unit: s)	Detection cycle	Time before clearance
ob_tenant_task_max_duration_seconds > 10800	The timeout duration for the execution of a task in an OceanBase Database tenant.	10800 (3 hours)	60 seconds	5 minutes

Trigger method	Alert level	Scope
Based on the expression of the metric	Critical	Tenant

Alert overview: ${alarm_target} ${alarm_name}
Alert details: Cluster: ${ob_cluster_name}. Tenant: ${tenant_name}. Host: ${host} (zone: ${obzone}). Alert: A system task in the OceanBase Database tenant has timed out. Task Type: ${task_type}; Execution Time: ${value_shown}, which exceeds the threshold of ${alarm_threshold} seconds.
Overview example: ob_cluster=C1-1000:tenant_name=T1. A task in the OceanBase Database tenant timed out.
Details example: Cluster: C1. Tenant: T1. Host: xxx.xxx.xxx.xxx (zone: Z1). Alert: A system task in the OceanBase Database tenant has timed out. Task Type: MIGRATE_UNIT. Execution Time: 4 hours, which exceeds the threshold of 10800 seconds.

A prolonged task execution period affects task scheduling in OceanBase Database, which reduces the task processing throughput.
Prolonged occupation of resources affects the system stability.
If the task fails, for example, the migration of units fails, unbalanced unit distribution is caused and fragmented resources cannot be reused.

None.

To migrate and modify units, you must ensure that the resources of the target server meet the requirements. Otherwise, the operation may fail.
Check the observer.log.wf file for ERROR logs and submit the identified ERROR logs to OceanBase Technical Support for troubleshooting.