ob_tenant_task_timeout

2025-03-26 07:47:21  Updated

Description

This alert is triggered when it takes over three hours to execute a task in an OceanBase Database tenant. The following task types are supported:

  • MIGRATE_UNIT: a unit migration task.
  • ALTER_TENANT_LOCALITY: a task to modify the locality of a tenant.
  • ROLLBACK_ALTER_TENANT_LOCALITY: a task to roll back the locality of a tenant.
  • SHRINK_RESOURCE_TENANT_UNIT_NUM: a task to scale in the number of units of a tenant.

Principle

Parameter Value
Metric ob_tenant_task_max_duration_seconds
Source Internal views. SQL statements for metric collection:
  • In OceanBase Database of a version earlier than V4.0: SELECT /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) */ tenant_id, task_type, max(TIMESTAMPDIFF(SECOND, sys_task_status.start_time, CURRENT_TIMESTAMP ) ) max_sys_task_duration_seconds FROM __all_virtual_sys_task_status sys_task_status WHERE svr_ip = ? AND svr_port = ? GROUP BY tenant_id,task_type
  • In OceanBase Database V4.0 or later: SELECT /* MONITOR_AGENT */ tenant_id, job_type as task_type, TIMESTAMPDIFF(SECOND, START_TIME, CURRENT_TIMESTAMP) as max_sys_task_duration_seconds, rs_svr_ip as svr_ip FROM DBA_OB_TENANT_JOBS WHERE job_status='INPROGRESS' AND rs_svr_ip=? and rs_svr_port=? UNION SELECT tenant_id, job_type as task_type, TIMESTAMPDIFF(SECOND, START_TIME, CURRENT_TIMESTAMP) as max_sys_task_duration_seconds, rs_svr_ip as svr_ip FROM DBA_OB_UNIT_JOBS WHERE tenant_id IS NOT NULL AND job_status='INPROGRESS' AND rs_svr_ip=? and rs_svr_port=?
  • Collected metric ob_tenant_task_max_duration_seconds
    Metric expression max(ob_tenant_task_max_duration_seconds{@LABELS}) by (@GBLABELS)
    Collection cycle 1 minute

    Alert rule

    Metric expression Metric description Default threshold (unit: s) Detection cycle Time before clearance
    ob_tenant_task_max_duration_seconds > 10800 The timeout duration for the execution of a task in an OceanBase Database tenant. 10800 (3 hours) 60 seconds 5 minutes

    Alert information

    Trigger method Alert level Scope
    Based on the expression of the metric Critical Tenant

    Alert templates

    • Alert overview: ${alarm_target} ${alarm_name}
    • Alert details: Cluster: ${ob_cluster_name}. Tenant: ${tenant_name}. Host: ${host} (zone: ${obzone}). Alert: A system task in the OceanBase Database tenant has timed out. Task Type: ${task_type}; Execution Time: ${value_shown}, which exceeds the threshold of ${alarm_threshold} seconds.
    • Overview example: ob_cluster=C1-1000:tenant_name=T1. A task in the OceanBase Database tenant timed out.
    • Details example: Cluster: C1. Tenant: T1. Host: xxx.xxx.xxx.xxx (zone: Z1). Alert: A system task in the OceanBase Database tenant has timed out. Task Type: MIGRATE_UNIT. Execution Time: 4 hours, which exceeds the threshold of 10800 seconds.

    Impact on the system

    • A prolonged task execution period affects task scheduling in OceanBase Database, which reduces the task processing throughput.
    • Prolonged occupation of resources affects the system stability.
    • If the task fails, for example, the migration of units fails, unbalanced unit distribution is caused and fragmented resources cannot be reused.

    Possible causes

    None.

    Solutions

    1. To migrate and modify units, you must ensure that the resources of the target server meet the requirements. Otherwise, the operation may fail.
    2. Check the observer.log.wf file for ERROR logs and submit the identified ERROR logs to OceanBase Technical Support for troubleshooting.

    Contact Us