Best practices for log interpretation in common scenarios-OceanBase

OceanBase Database's log module generates three types of log files: observer.log, election.log, and rootservice.log. By default, logs with a severity level of WDIAG or higher are recorded. Each log file also has a corresponding WARNING log file with a .wf suffix (such as observer.log.wf, election.log.wf, rootservice.log.wf) that only contains logs with a severity level of WARN or higher.

This topic provides guidance on how to interpret commonly used logs. The log files observer.log, election.log, and rootservice.log are all stored in the $work_dir/log directory on the OBServer node.

Applicable versions

This topic applies to OceanBase Database V2.x and V3.x.

Log levels

OceanBase Database V2.x and V3.2.3

OceanBase Database V2.x and V3.2.3 support six log levels, as shown in the table below. The levels are listed in descending order of severity.

Log level	Description
ERROR	Critical errors. Logs at this level record information about system faults that must be troubleshoot. Otherwise, the system will be unavailable.
USER_ERROR	Errors caused by user input.
WARN	Warnings. Logs at this level record potential errors that may occur.
INFO	General information. Logs at this level record the current operation status of the system.
TRACE	More detailed event information compared with logs at the INFO level.
DEBUG	Debugging information. Logs at this level record details about the operation status of the system during debugging, including the names, parameters, variables, and return values of the functions that are called.

OceanBase Database V3.2.4

OceanBase Database V3.2.4 supports seven log levels, as shown in the table below. The levels are listed in descending order of severity.

Log level	Description
ERROR	Critical errors. Logs at this level record information about system faults that must be troubleshoot. Otherwise, the system will be unavailable.
WARN	Warnings. Logs at this level record potential errors that may occur.
INFO	General information. Logs at this level record the current operation status of the system.
EDIAG	Error diagnostic information, which usually corresponds to a bug of OceanBase Database.
WDIAG	Warning diagnostic information indicating expected errors that OceanBase Database can tolerate.
TRACE	More detailed event information compared with logs at the INFO level.
DEBUG	Debugging information. Logs at this level record details about the operation status of the system during debugging, including the names, parameters, variables, and return values of the functions that are called.

Query logs

You can use the grep command to query logs. The syntax is as follows:

[admin@OB /home/admin/oceanbase/log] $grep [keyword] observer.log
[admin@OB /home/admin/oceanbase/log] $grep [keyword] rootservice.log
[admin@OB /home/admin/oceanbase/log] $grep [keyword] election.log

Log formats

The log data format is as follows. Note that the exact format may vary depending on the context.

[time] log_level [module_name] (file_name:fine_no) [thread_id][Y_trace_id0-trace_id1] [lt=last_log_print_time] [dc=dropped_log_count] log_data_

[time] log_level [module_name] function_name (file_name:fine_no) [thread_id][Y_trace_id0-trace_id1] [lt=last_log_print_time] [dc=] log_data_

Here are some examples:

observer.log

[admin@OB /home/admin/oceanbase/log] $tail -f -n 1 observer.log

The return result is as follows:

[2016-07-17 14:18:04.845802] INFO  [RPC.OBMYSQL] obsm_handler.cpp:191 [9543][Y0-0] [lt=47] [dc=0] connection close(easy_connection_str(c)="xxx.xxx.xxx.xxx:56854_-1_0x7fb8a9171b68", version=0, sessid=2147562562, tenant_id=1, server_id=1, is_need_clear_sessid_=true, ret=0)

observer.log.wf

[admin@OB /home/admin/oceanbase/log] $tail -f -n 1 observer.log.wf

The return result is as follows:

[2016-07-17 14:18:28.431351] WARN  [SQL.SESSION] set_conn (ob_basic_session_info.cpp:2568) [8541][xxxxxxxxxxxx-xxxxxxxx] [lt=16] [dc=0] debug for set_conn(conn=0x7fb8a9171b68, lbt()="0x4efe71 0x818afd 0xe9ea5b 0x721fc8 0x13747bc 0x2636db0 0x2637d68 0x5054e9 0x7fb98705aaa1 0x7fb9852cc93d ", magic_num_=324478056, sessid_=2147562617, version_=0)

rootservice.log

[admin@OB /home/admin/oceanbase/log] $tail -f -n 1 rootservice.log

The return result is as follows:

[2016-07-17 14:18:53.701463] INFO  [RS] ob_server_table_operator.cpp:345 [8564][Y0-0] [lt=11] [dc=0] svr_status(svr_status="active", display_status=1)

rootservice.log.wf

[admin@OB /home/admin/oceanbase/log] $tail -f -n 1 rootservice.log.wf

The return result is as follows:

[2016-07-16 02:02:12.847602] WARN  [RS] choose_leader (ob_leader_coordinator.cpp:2067) [8570][xxxxxxxxxxxx-xxxxxxxx] [lt=8] [dc=0] choose leader info with not same candidate num(tenant_id=1005, server="xxx.xxx.xxx.xxx:2882", info={original_leader_count:0, primary_zone_count:0, cur_leader_count:1, candidate_count:1, in_normal_unit_count:1})

The following table describes the parameters in logs.

Parameter	Description
time	The time when the log was generated.
log_level	The level of the log.
module_name	The module that contains the statement corresponding to the log.
function_name	The function that contains the statement corresponding to the log.
file_name	The file that contains the statement corresponding to the log.
file_no	The row number of the statement corresponding to the log.
thread_id	The ID of the thread corresponding to the log.
trace_id0-trace_id1	The trace ID of the log, which consists of trace_id0 and trace_id1. A trace ID can be transmitted between OBServer nodes through remote procedure calls (RPCs). Therefore, you can obtain associated logs of OBServer nodes based on trace IDs.
last_log_print_time	OBServer nodes support synchronous and asynchronous logs. When synchronous logs are enabled, this parameter indicates the amount of time spent on recording the previous log. When asynchronous logs are enabled, this parameter indicates the amount of time spent on formatting the current log.
dropped_log_count	The number of dropped logs.
log_data	The detailed log data.

Examples of log interpretation

observer.log

Startup logs of OBServer nodes

Statistical logs

Common diagnostic logs

Search keyword: [NOTICE]

Log description: key log information during the startup of an OBServer node.

[admin@OB /home/admin/oceanbase/log] $grep "NOTICE" observer.log

The return result is as follows:

[2023-05-11 14:19:09.703272] INFO  [SERVER] ob_server.cpp:533 [95303][0][Y0-0000000000000000-0-0] [lt=9] [dc=0] [NOTICE] server instance start succeed

[2023-05-11 14:19:25.555836] INFO  [SERVER] ob_server.cpp:609 [95303][0][Y0-0000000000000000-0-0] [lt=16] [dc=0] [NOTICE] observer start service(start_service_time=1683785965555827)

The following table describes the information in the logs.

Log information	Description
server instance start succeed	The observer process is successfully started.
observer start service	The OBServer node starts to provide services. `start_service_time` indicates the timestamp when the OBServer node starts to provide services.

Process memory statistics

Search keyword: [CHUNK_MGR]

Log description: memory usage statistics of the observer process from the system perspective, which are printed every 10s.

[admin@OB /home/admin/oceanbase/log] $grep "CHUNK_MGR" observer.log

The return result is as follows:

[2023-05-15 16:24:19.728623] INFO  [COMMON] ob_tenant_mgr.cpp:643 [87101][0][Y0-0000000000000000-0-0] [lt=7] [dc=0] [CHUNK_MGR] free=40593 pushes=8989641 pops=8949048 limit=107,978,014,720 hold=106,384,326,656 used= 21,254,635,520 freelist_hold= 85,129,691,136 maps=602,804 unmaps=587,963 large_maps=599,637 large_unmaps=587,963 memalign=0   virtual_memory_used=111,853,719,552

Log interpretation

Log information	Description
limit	The total memory available for the OBServer node, in bytes.
hold	The total memory that the OBServer node obtains from the operating system, in bytes.
used	The memory that is used by the OBServer node, including the cache memory, in bytes.
freelist_hold	The size of unused memory on the OBServer node, in bytes.
maps and unmaps	The `maps` and `unmaps` values, in bytes, monitored by the OBServer node.
large_maps and large_unmaps	The `maps` and `unmaps` values, in bytes, monitored by the OBServer node for data blocks sized 4 MB to 20 MB.

Tenant memory statistics

Search keyword: [MEMORY] or ob_malloc_allocator.cpp

Log description: memory usage statistics of the tenant from the system perspective, which are printed every 10s.

[admin@OB /home/admin/oceanbase/log] $grep "MEMORY" observer.log

[admin@OB /home/admin/oceanbase/log] $grep "ob_malloc_allocator" observer.log

The return result is as follows:

[2023-06-05 12:09:21.701235] INFO  [LIB] ob_malloc_allocator.cpp:495 [88518][0][Y0-0000000000000000-0-0] [lt=5] [dc=0] [MEMORY] tenant: 1, limit: 12,884,901,888 hold: 8,524,922,880 rpc_hold: 0 cache_hold: 6,773,800,960 cache_used: 6,773,800,960 cache_item_count: 3,230
[MEMORY] ctx_id=           DEFAULT_CTX_ID hold_bytes=    658,505,728
[MEMORY] ctx_id=          MEMSTORE_CTX_ID hold_bytes=    383,778,816
[MEMORY] ctx_id=           EXECUTE_CTX_ID hold_bytes=     16,777,216
[MEMORY] ctx_id=          SQL_EXEC_CTX_ID hold_bytes=     77,594,624
[MEMORY] ctx_id=         TRANS_CTX_MGR_ID hold_bytes=      8,388,608
[MEMORY] ctx_id=     REPLAY_STATUS_CTX_ID hold_bytes=      6,291,456
[MEMORY] ctx_id=        PLAN_CACHE_CTX_ID hold_bytes=    553,648,128
[MEMORY] ctx_id=                WORK_AREA hold_bytes=     27,262,976
[MEMORY] ctx_id=       TX_CALLBACK_CTX_ID hold_bytes=     18,874,368

[2023-06-05 12:09:21.701307] INFO  [LIB] ob_tenant_ctx_allocator.cpp:167 [88518][0][Y0-0000000000000000-0-0] [lt=16] [dc=0]
[MEMORY] tenant_id=    1 ctx_id=           DEFAULT_CTX_ID hold=    658,505,728 used=    553,301,808
[MEMORY] hold=    371,125,248 used=    368,276,480 count=     139 avg_used=      2,649,471 mod=MysqlRequesReco
[MEMORY] hold=     52,416,000 used=     52,403,200 count=      25 avg_used=      2,096,128 mod=OB_KVSTORE_CACHE
[MEMORY] hold=     49,029,760 used=     48,846,080 count=      24 avg_used=      2,035,253 mod=SqlPlanMon
[MEMORY] hold=     35,543,040 used=     35,192,202 count=     534 avg_used=         65,903 mod=SqlDtl
[MEMORY] hold=     14,696,192 used=     12,591,876 count=      14 avg_used=        899,419 mod=LOCALDEVICE
[MEMORY] hold=      8,386,560 used=      8,384,512 count=       4 avg_used=      2,096,128 mod=Election
[MEMORY] hold=      6,298,560 used=      6,281,216 count=     257 avg_used=         24,440 mod=BlockMap
[MEMORY] hold=      6,289,920 used=      6,288,384 count=       3 avg_used=      2,096,128 mod=ElectionGroup
[MEMORY] hold=      2,154,240 used=        408,600 count=  24,952 avg_used=             16 mod=Number

Log interpretation

Log information	Description
tenant	The ID of the tenant.
limit	The maximum memory for the tenant, in bytes.
hold	The total memory that the tenant obtains from the operating system, in bytes.
rpc_hold	The memory occupied by RPC threads, in bytes.
cache_hold	The memory occupied by KV caches, in bytes.
cache_hold	The memory used by KV cache, in bytes.
cache_item_count	The number of KV caches.

Tenant disk statistics

Search keyword: ob_tenant_disk_usage_mgr.cpp or dump tenant disk usage update info

Log description: disk usage information of the tenant, which is printed every 10s.

[admin@OB /home/admin/oceanbase/log] $grep "ob_tenant_disk_usage_mgr.cpp" observer.log

[admin@OB /home/admin/oceanbase/log] $grep "dump tenant disk usage update info" observer.log

The return result is as follows:

[2023-05-15 17:30:59.055858] INFO  [STORAGE] ob_tenant_disk_usage_mgr.cpp:69 [87424][0][Y0-0000000000000000-0-0] [lt=11] [dc=0] dump tenant disk usage update info(disk_usage={tenant_id:1, max_disk_size:0, used_data_size:935329792, used_meta_size:56623104})

Log interpretation

Log information	Description
tenant_id	The ID of the tenant.
max_disk_size	The maximum disk usage allowed for the tenant. It is not limited in OceanBase Database of versions earlier than V4.x.
used_data_size	The size of disk space occupied by user data in the tenant.
used_meta_size	The size of disk space occupied by management data in the tenant.

Tenant thread queue statistics

Search keyword: ob_multi_tenant.cpp or dump tenant info

Log description: thread queue statistics of the tenant, which are printed every 10s.

[admin@OB /home/admin/oceanbase/log] $grep "ob_multi_tenant.cpp" observer.log

[admin@OB /home/admin/oceanbase/log] $grep "dump tenant info" observer.log

The return result is as follows:

[2023-05-15 17:31:43.710008] INFO  [SERVER.OMT] ob_multi_tenant.cpp:812 [88299][0][Y0-0000000000000000-0-0] [lt=8] [dc=0] dump tenant info(tenant={id:1001, compat_mode:1, unit_min_cpu:"3.000000000000000000e+00", unit_max_cpu:"3.000000000000000000e+00", slice:"0.000000000000000000e+00", slice_remain:"0.000000000000000000e+00", token_cnt:12,  ass_token_cnt:12, lq_tokens:1, used_lq_tokens:0, stopped:false, idle_us:9509657, recv_hp_rpc_cnt:0, recv_np_rpc_cnt:0, recv_lp_rpc_cnt:0, recv_mysql_cnt:0, recv_task_cnt:0, recv_large_req_cnt:0, tt_large_quries:0, actives:12, workers:12, nesting workers:7, lq waiting workers:0, req_queue:total_size=0 queue[0]=0 queue[1]=0 queue[2]=0 queue[3]=0 queue[4]=0 queue[5]=0 , large queued:0, multi_level_queue:total_size=0 queue[0]=0 queue[1]=0 queue[2]=0 queue[3]=0 queue[4]=0 queue[5]=0 queue[6]=0 queue[7]=0 })

Log interpretation

Log information	Description
compat_mode	The compatibility mode of the tenant.
unit_min_cpu	The minimum number of CPU cores.
unit_max_cpu	The maximum number of CPU cores.
slice	The size of slices for resource units of the tenant.
slice_remain	The size of remaining slices.
token_cnt	The number of tokens allocated by the scheduler. Each token is converted into a worker thread.
ass_token_cnt	The number of tokens confirmed by the tenant. You can confirm the number of tokens based on the `token_cnt` parameter. Typically, the values of the two parameters are the same.
lq_tokens	The number of large query tokens.
used_lq_tokens	The number of worker threads that hold large query tokens.
stopped	Indicates whether resource units of the tenant are being deleted.
idle_us	The total idle time of the worker threads in one round (10s). The idle time includes only the wait time in queues.
recv_hp_rpc_cnt recv_np_rpc_cnt recv_lp_rpc_cnt	The cumulative numbers of RPC requests received by the tenant at different levels, including hp(High), np(Normal), and lp(Low).
recv_mysql_cnt	The cumulative number of MySQL requests received by the tenant.
recv_task_cnt	The cumulative number of internal tasks received by the tenant.
recv_large_req_cnt	The cumulative number of large requests predicted by the tenant. The value of this parameter is only incremented and will not be cleared. The value is incremented upon retries.
tt_large_quries	The cumulative number of large requests processed by the tenant. The value of this parameter is only incremented and will not be cleared. The value is incremented upon checks.
actives	The number of active worker threads.
workers	The number of SQL worker threads of the tenant.
nesting workers	The number of dedicated threads held by the tenant for nested requests. A total of seven threads are available, which correspond to seven nesting levels.
lq waiting workers	The number of worker threads waiting for scheduling.
req_queue	The total number of requests waiting in the queue. This field is equivalent to `total_size`.
large queued	The number of large query requests waiting in the queue.
multi_level_queue	The worker queues to accommodate nested requests. `queue[1]` to `queue[7]` correspond to the seven nesting levels and each indicate the number of queued requests at the corresponding level. `queue[0]` is not used at present.

RPC thread statistics

Search keyword: ob_net_easy.cpp or [RPC EASY STAT]

Log description: RPC thread statistics, which are printed every 1s. Each log represents the statistics of one RPC thread.

[admin@OB /home/admin/oceanbase/log] $grep "ob_net_easy.cpp" observer.log

[admin@OB /home/admin/oceanbase/log] $grep "RPC EASY STAT" observer.log

The return result is as follows:

[2023-05-15 18:28:17.681483] INFO  [RPC.FRAME] ob_net_easy.cpp:601 [88307][0][Y0-0000000000000000-0-0] [lt=18] [dc=0] [RPC EASY STAT](log_str="conn count=3/4, request done=7404635/5179373, request doing=0/0")
[2023-05-15 18:28:17.682056] INFO  [RPC.FRAME] ob_net_easy.cpp:601 [88310][0][Y0-0000000000000000-0-0] [lt=8] [dc=0] [RPC EASY STAT](log_str="conn count=3/3, request done=7402021/5162953, request doing=0/0")
[2023-05-15 18:28:17.682108] INFO  [RPC.FRAME] ob_net_easy.cpp:601 [88309][0][Y0-0000000000000000-0-0] [lt=11] [dc=0] [RPC EASY STAT](log_str="conn count=3/3, request done=7405199/5164208, request doing=0/0")
[2023-05-15 18:28:17.682108] INFO  [RPC.FRAME] ob_net_easy.cpp:601 [88308][0][Y0-0000000000000000-0-0] [lt=9] [dc=0] [RPC EASY STAT](log_str="conn count=3/3, request done=7402609/5163322, request doing=0/0")

Log interpretation

Log information	Description
conn count	The number of inbound and outbound connections served by the RPC thread.
request done	The number of inbound and outbound requests completed by the RPC thread.
request doing	The number of inbound and outbound requests being processed by the RPC thread.

If no RPC thread is available (request doing=0/0), all RPC threads are busy.

Transaction context statistics

Search keyword: ObTransCtx statistics

Log description: statistics of the numbers of allocated transaction contexts, which are printed every 5s.

[admin@OB /home/admin/oceanbase/log] $grep "ObTransCtx statistics" observer.log

The return result is as follows:

[2023-05-15 17:29:07.955826] INFO  [STORAGE.TRANS] ob_trans_factory.cpp:175 [88157][0][YB420BA6506F-0005FB92633E1702-0-0] [lt=11] [dc=0] ObTransCtx statistics(active_sche_ctx_count=0, active_coord_ctx_count=0, active_part_ctx_count=1, total_release_part_ctx_count=156)

Log interpretation

Log information	Description
active_sche_ctx_count	The number of active transaction contexts of the scheduler.
active_coord_ctx_count	The number of active transaction contexts of the coordinator.
active_part_ctx_count	The number of active transaction contexts of the participant.
total_release_part_ctx_count	The number of released transaction contexts.

Each OBServer node allows for a maximum of 0.1 million scheduler contexts and 0.7 million participant contexts.

Disk I/O statistics

Search keyword: Current io stat (in OceanBase Database V2.x) or current io status (in OceanBase Database V3.x)

Log description: I/O statistics, which are printed every 10s.

Log interpretation in OceanBase Database V2.x

[admin@OB /home/admin/oceanbase/log] $grep "Current io stat" observer.log

The return result is as follows:

[2022-12-27 17:28:19.298670] INFO [COMMON] ob_io_disk.cpp:497 [47844][406][Y0-0000000000000000] [lt=59] [dc=0] Current io stat, (fd={fd:875, disk_id:{disk_idx:0, install_seq:0}}, io_conf={sys_io_low_percent:90, sys_io_high_percent:90, user_iort_up_percent:2000, cpu_high_water_level:7680, write_failure_detect_interval:60000000, read_failure_black_list_interval:300000000, data_storage_warning_tolerance_time:10000000, data_storage_error_tolerance_time:300000000, disk_io_thread_count:8, callback_thread_count:8, large_query_io_percent:0, data_storage_io_timeout_ms:10000}, sys_io_percent=90, sys_iops_up_limit=266, user_max_rt=1.151363996887207031e+04, cpu_estimator={avg_usage:378}, user_read_io_stat={average_size:18245, average_rt_us:"4.766969696969696884e+02", old_stat:{io_cnt:5435154700, io_bytes:92266120699904, io_rt_us:2896942021380}, new_stat:{io_cnt:5435154733, io_bytes:92266121302016, io_rt_us:2896942037111}}, user_write_io_stat={average_size:0, average_rt_us:"0.000000000000000000e+00", old_stat:{io_cnt:790, io_bytes:1568509952, io_rt_us:1369408}, new_stat:{io_cnt:790, io_bytes:1568509952, io_rt_us:1369408}}, sys_read_io_stat={average_size:0, average_rt_us:"0.000000000000000000e+00", old_stat:{io_cnt:29673855, io_bytes:62197411557376, io_rt_us:62528978207}, new_stat:{io_cnt:29673855, io_bytes:62197411557376, io_rt_us:62528978207}}, sys_write_io_stat={average_size:0, average_rt_us:"0.000000000000000000e+00", old_stat:{io_cnt:24420687, io_bytes:51213892583424, io_rt_us:56376648298}, new_stat:{io_cnt:24420687, io_bytes:51213892583424, io_rt_us:56376648298}}, large_query_read_io_stat={average_size:0, average_rt_us:"0.000000000000000000e+00", old_stat:{io_cnt:0, io_bytes:0, io_rt_us:0}, new_stat:{io_cnt:0, io_bytes:0, io_rt_us:0}}, large_query_write_io_stat={average_size:0, average_rt_us:"0.000000000000000000e+00", old_stat:{io_cnt:0, io_bytes:0, io_rt_us:0}, new_stat:{io_cnt:0, io_bytes:0, io_rt_us:0}})

Log interpretation

Log information	Description
fd and disk_id	This part includes the file descriptor number (`fd`), the disk index number (`disk_idx`), and the installation serial number (`install_seq`).
io_config	This part includes system I/O parameters, such as `sys_io_low_percent` (minimum system I/O percentage), `sys_io_high_percent` (maximum system I/O percentage), `user_iort_up_percent` (maximum user I/O), `cpu_high_water_level` (maximum CPU utilization), `disk_io_thread_count` (number of disk I/O threads), `callback_thread_count` (number of callback threads), `large_query_io_percent` (percentage of large query I/Os), and `data_storage_io_timeout_ms` (timeout period for data storage I/Os, in ms).
sys_io_percent	The maximum I/O resource usage.
sys_iops_up_limit	The maximum IOPS of I/O operations on data blocks sized 2 MB, which must be consistent with `io_resource.conf` in the I/O bench test result.
user_max_rt	The maximum response time of user I/O operations.
cpu_estimator(avg_usage)	The average CPU utilization of user I/O operations.
user_read_io_stat user_write_io_stat sys_read_io_stat sys_write_io_stat large_query_read_io_stat large_query_write_io_stat	This part includes different types of I/O statistics, such as the average I/O size (`average_size`), average response time (`average_rt_us`), old and new statistics, including the number of I/O operations (`io_cnt`), number of bytes involved in I/O operations (`io_bytes`), and response time of I/O operations (`io_rt_us`).
io_ability	This part includes the I/O performance test results, such as the IOPS and response time of I/O operations in different modes (read or write) and of different sizes.
log_io user_io sys_io prewarm_io large_query_io	This part includes the status and device capabilities of different types of I/O operations, such as whether the system is initialized (`is_inited_`), device capabilities (`device_ability_`), the percentage of shared I/O resources (`share_percentage_`), and the deadline timestamp (`last_deadline_ts_`).
log_read log_write user_read user_write sys_read sys_write prewarm_read prewarm_write large_query_read large_query_write	This part includes statistics of different types of I/O operations, such as the start IOPS (`start_iops`), IOPS of canceled I/O operations (`cancel_iops`), IOPS of failed I/O operations (`fail_iops`), IOPS of successful I/O operations (`succ_iops`), average byte size (`avg_byte`), device response time (`device_rt_us`), and queue response time (`queue_rt_us`).
avg_cpu	The average CPU utilization of the system.

OceanBase Database allows you to control I/O throttling by using the following two parameters: sys_bkgd_io_low_percentage and sys_bkgd_io_high_percentage, which respectively specify the lower and upper limits of the sys_io_percent field.

You can determine whether I/O operations are throttled based on the preceding I/O statistics as follows:

If the values of sys_io_high_percent and sys_io_low_percent are inconsistent with those of sys_bkgd_io_high_percentage and sys_bkgd_io_low_percentage, I/O throttling is enabled.
If the average_size value in user_read_io_stat is 0, I/O operations are throttled.

Log interpretation in OceanBase Database V3.x

[admin@OB /home/admin/oceanbase/log] $grep "current io status" observer.log

The return result is as follows:

[2023-06-21 16:54:14.212314] INFO [COMMON] ob_io_struct.cpp:2129 [89617][0][Y0-0000000000000000-0-0] [lt=19] [dc=0] current io status(io_config={sys_io_low_percent_:70, sys_io_high_percent_:90, user_iort_up_percent_:100, cpu_high_water_level_:2560, write_failure_detect_interval_:60000000, read_failure_black_list_interval_:300000000, data_storage_warning_tolerance_time_:30000000, data_storage_error_tolerance_time_:300000000, disk_io_thread_count_:8, callback_thread_count_:8, large_query_io_percent_:0, data_storage_io_timeout_ms_:120000, io_memory_limit_:2147483648, enable_io_tracer_:false}, io_ability={measure_items:[[{mode:0, size:4096, iops:120191.00, rt_us:180.58}, {mode:0, size:8192, iops:155470.00, rt_us:198.67}, {mode:0, size:16384, iops:122040.00, rt_us:259.74}, {mode:0, size:32768, iops:76429.00, rt_us:416.30}, {mode:0, size:65536, iops:41826.00, rt_us:762.89}, {mode:0, size:131072, iops:21994.00, rt_us:1452.21}, {mode:0, size:262144, iops:11247.00, rt_us:2841.93}, {mode:0, size:524288, iops:5649.00, rt_us:5658.40}, {mode:0, size:1048576, iops:2828.00, rt_us:11302.68}], [{mode:1, size:4096, iops:127929.00, rt_us:247.71}, {mode:1, size:8192, iops:120343.00, rt_us:263.40}, {mode:1, size:16384, iops:115208.00, rt_us:275.45}, {mode:1, size:32768, iops:76882.00, rt_us:414.05}, {mode:1, size:65536, iops:39596.00, rt_us:805.95}, {mode:1, size:131072, iops:20989.00, rt_us:1522.00}, {mode:1, size:262144, iops:10634.00, rt_us:3004.64}, {mode:1, size:524288, iops:5355.00, rt_us:5970.53}, {mode:1, size:1048576, iops:2692.00, rt_us:11877.67}, {mode:1, size:2097152, iops:1309.00, rt_us:24402.57}]]}, log_io={is_inited_:true, device_ability_:null, share_percentage_:100, last_deadline_ts_:0}, user_io={is_inited_:true, device_ability_:null, share_percentage_:100, last_deadline_ts_:0}, sys_io={is_inited_:true, device_ability_:0x32bb7ba8, share_percentage_:90, last_deadline_ts_:1687334746495760}, prewarm_io={is_inited_:true, device_ability_:0x32bb7ba8, share_percentage_:100, last_deadline_ts_:0}, large_query_io={is_inited_:true, device_ability_:0x32bb7ba8, share_percentage_:100, last_deadline_ts_:0}, log_read={}, log_write={start_iops:29.93, cancel_iops:0.00, fail_iops:0.00, succ_iops:29.93, avg_byte:5461.33, device_rt_us:140.00, queue_rt_us:7.33}, user_read={}, user_write={}, sys_read={}, sys_write={}, prewarm_read={}, prewarm_write={}, large_query_read={}, large_query_write={}, avg_cpu=2.126034912718204453e+02)

Log interpretation

Log information	Description
io_config	This part includes system I/O parameters, such as `sys_io_low_percent_` (minimum system I/O percentage), `sys_io_high_percent_` (maximum system I/O percentage), `user_iort_up_percent_` (maximum user I/O percentage), `cpu_high_water_level_` (maximum CPU utilization), `disk_io_thread_count_` (number of disk I/O threads), `callback_thread_count_` (number of callback threads), `large_query_io_percent_` (percentage of large query I/Os), `data_storage_io_timeout_ms_` (timeout period for data storage I/Os, in ms), and `io_memory_limit_` (maximum memory for I/O operations).
io_ability	This part includes the I/O performance test results, such as the IOPS and response time of I/O operations in different modes (read or write) and of different sizes.
log_io user_io sys_io prewarm_io large_query_io	This part includes the status and device capabilities of different types of I/O operations, such as whether the system is initialized (`is_inited_`), device capabilities (`device_ability_`), the percentage of shared I/O resources (`share_percentage_`), and the deadline timestamp (`last_deadline_ts_`).
log_read log_write user_read user_write sys_read sys_write prewarm_read prewarm_write large_query_read large_query_write	This part includes statistics of different types of I/O operations, such as the start IOPS (`start_iops`), IOPS of canceled I/O operations (`cancel_iops`), IOPS of failed I/O operations (`fail_iops`), IOPS of successful I/O operations (`succ_iops`), average byte size (`avg_byte`), device response time (`device_rt_us`), and queue response time (`queue_rt_us`).
avg_cpu	The average CPU utilization of the system.

You can determine whether I/O operations are throttled based on the preceding I/O statistics as follows:

If the values of sys_io_high_percent and sys_io_low_percent are inconsistent with those of sys_bkgd_io_high_percentage and sys_bkgd_io_low_percentage, I/O throttling is enabled.
If the value of last_deadline_ts_ is later than the time when the log is printed, I/O operations are throttled.

Archiving thread statistics

Search keyword: print_archive_status

Log description: log archiving statistics, which are printed every 10s.

[admin@OB /home/admin/oceanbase/log] $grep "print_archive_status" observer.log

The return result is as follows:

[2023-08-24 13:59:53.565584] INFO  [ARCHIVE] ob_archive_mgr.cpp:830 [11357][0][Y0-0000000000000000-0-0] [lt=13] [dc=0] print_archive_status(clog_split_task_num=0, send_task_num=2, pre_send_task_capacity=458752, pg_archive_task_count=903, archive_incarnation=1, archive_round=6, piece_id=0, piece_create_date=20230810, log_archive_status=2, has_encount_error=false)

Log interpretation

Log information	Description
clog_split_task_num	The number of ongoing clog splitting tasks.
send_task_num	The number of ongoing clog sending tasks.
pre_send_task_capacity	The amount of data to be archived.
pg_archive_task_count	The number of archiving tasks associated with the partition group.
log_archive_status	The archiving status. Valid values: `0: an invalid state. </li><li>`1`: Archiving is being started. </li><li>`2`: Archiving is in progress. </li><li>`3`: Archiving is being stopped. </li><li>`4`: Archiving has been stopped.

Search keyword: archive_clog_split_engine statistics

Log description: statistics of clog splitting threads, which are printed every 10s.

[admin@OB /home/admin/oceanbase/log] $grep "archive_clog_split_engine statistics" observer.log

The return result is as follows:

[2023-08-24 14:46:13.179165] INFO  [ARCHIVE] ob_archive_clog_split_engine.cpp:1120 [11389][0][Y0-0000000000000000-0-0] [lt=11] [dc=0] archive_clog_split_engine statistics(avg_split_task_used=51, avg_read_log_used=40, avg_read_log_size=3058, avg_get_send_task_used=11, avg_send_task_per_task=1, split_task_count=17)

Log interpretation

Log information	Description
avg_split_task_used	The average time consumed by a clog splitting task, in microseconds.
avg_read_log_used	The average time spent in reading log data at a time, in microseconds.
avg_read_log_size	The average size of log data read at a time.
avg_get_send_task_used	The average time spent in obtaining a clog sending task, in microseconds.
avg_send_task_per_task	The average number of clog sending tasks corresponding to a clog splitting task.
split_task_count	The number of clog splitting tasks.

Search keyword: archive_sender statistic

Log description: statistics of each clog sending thread, which are printed every 10s.

[admin@OB /home/admin/oceanbase/log] $grep "archive_sender statistic" observer.log

The return result is as follows:

[2023-06-12 16:15:50.311863] INFO [ARCHIVE] ob_archive_sender.cpp:1589 [25879][0][Y0-0000000000000000-0-0] [lt=5] [dc=0] archive_sender statistic in 10s(total_send_buf_size=29384, total_send_log_count=157, avg_log_size=187, total_send_task_count=2, total_send_cost_ts=2809, avg_send_task_cost_ts=1404)

Log interpretation

Log information	Description
total_send_buf_size	The size of log data sent by the archiving thread within 10s. If log compression is enabled, the size of sent log data is one third that of uncompressed logs.
total_send_log_count	The number of log records sent by the archiving thread within 10s.
total_send_task_count	The total number of network I/Os of the archiving thread within 10s.
total_send_cost_ts	The total time consumed by the archiving thread within 10s, in microseconds. If the value approaches or exceeds 10s, the maximum I/O processing capability of the system has been reached.
avg_send_task_cost_ts	The average time consumed by a network I/O, in microseconds. The value ranges from several milliseconds to tens of milliseconds.

DAG thread statistics

Search keyword: dump_dag_status

Log description: statistics of directed acyclic graph (DAG) threads, which are printed every 10s.

[admin@OB /home/admin/oceanbase/log] $grep "dump_dag_status" observer.log

The return result is as follows:

[2023-09-04 17:15:25.494000] INFO  [COMMON] ob_dag_scheduler.cpp:1314 [9777][0][Y0-0000000000000000-0-0] [lt=5] [dc=0] dump_dag_status(priority="DAG_PRIO_SSTABLE_MINI_MERGE", low_limit=4, running_task=0)

[2023-09-04 17:15:25.494065] INFO  [COMMON] ob_dag_scheduler.cpp:1318 [9777][0][Y0-0000000000000000-0-0] [lt=3] [dc=0] dump_dag_status(up_limit_type="DAG_ULT_MINI_MERGE", up_limit=8, running_task_per_ult=0)

[2023-09-04 17:15:25.494121] INFO  [COMMON] ob_dag_scheduler.cpp:1322 [9777][0][Y0-0000000000000000-0-0] [lt=4] [dc=0] dump_dag_status(type="DAG_MINOR_MERGE", dag_count=0)

[2023-09-04 17:15:25.494125] INFO  [COMMON] ob_dag_scheduler.cpp:1322 [9777][0][Y0-0000000000000000-0-0] [lt=3] [dc=0] dump_dag_status(type="DAG_MAJOR_MERGE", dag_count=0)

[2023-09-04 17:15:25.494210] INFO  [COMMON] ob_dag_scheduler.cpp:1324 [9777][0][Y0-0000000000000000-0-0] [lt=3] [dc=0] dump_dag_status(total_worker_cnt=52, total_running_task_cnt=0, work_thread_num=113)

Log interpretation

Log information	Description
priority	The priority of background tasks. OceanBase Database preferentially schedules tasks with a high priority.
low_limit	The minimum number of threads for background tasks.
up_limit	The maximum number of threads for background tasks.
type	The type of background tasks.
dag_count	The total number of requests being queued.
total_worker_cnt	The total number of active threads for background tasks of various types.
total_running_task_cnt	The total number of ongoing background tasks of various types.
work_thread_num	The maximum number of threads for background tasks of various types.

Memory allocation failure

Search keyword: [OOPS], alloc failed, -4013, or -4030

Log description: memory allocation failure.

[admin@OB /home/admin/oceanbase/log] $grep "OOPS|alloc failed|-4013|-4030" observer.log

The return result is as follows:

[2019-11-29 13:32:19.702557] WARN  alloc_chunk (achunk_mgr.cpp:156) [116413][1492][XXXXXXXXXXXX-XXXXXXXXXXXXXXXX] [lt=10] [dc=0] oops, over total memory limit, hold=241078108160 limit=243230035968

[2019-11-29 13:32:19.703452] WARN  alloc (ob_tenant_ctx_allocator.h:111) [116413][1492][XXXXXXXXXXXX-XXXXXXXXXXXXXXXX] [lt=15] [dc=0] oops, alloc failed, tenant_id=500 ctx_id=0 ctx_name=DEFAULT_CTX_ID ctx_hold=13245612032 ctx_limit=9223372036854775807 tenant_hold=26003955712 tenant_limit=9223372036854775807

[2019-11-29 13:32:19.710755] ERROR [LIB] ob_malloc (ob_malloc.h:49) [116413][1492][XXXXXXXXXXXX-XXXXXXXXXXXXXXXX] [lt=364] [dc=0] allocate memory fail(attr=tenant_id=500, mod_id=253, ctx_id=0, prio=0, nbyte=2097152)

Log interpretation

Log information	Description
ctx_name	The memory module with memory overrun.
ctx_hold	The size of memory held by the memory module.
ctx_limit	The maximum size of memory for the memory module.
alloc_size	The size of memory allocated to the current memory allocation request.

MemStore throttling

Search keyword: report write throttle info

Log description: statistics of thresholds for triggering MemStore write throttling.

[admin@OB /home/admin/oceanbase/log] $grep "report write throttle info" observer.log

The return result is as follows:

2022-12-29 04:46:44.841893] INFO  [COMMON] ob_fifo_arena.cpp:333 [28948][0][XXXXXXXXXXXX-XXXXXXXXXXXXXXXXXX-X-X] [lt=16] [dc=0] report write throttle info(alloc_size=24, throttling_interval=1000, attr_=tenant_id=1005, label=15, ctx_id=1, prio=0, freed memory(MB):=0, last_base_ts=1672260404842042, cur_mem_hold=11970543616, trigger_mem_limit=21990232512, trigger_mem_remain=5497558128, tenant_memory_remain=5448400896, throttle_info_={decay_factor_:"3.002618459977128736e-04", alloc_duration_:3600000000, trigger_percentage_:80, memstore_threshold_:27487790640, period_throttled_count_:45, period_throttled_time_:315460, total_throttled_count_:181984128, total_throttled_time_:72058595126473244}, mem_overused=49157232)

Log interpretation

Log information	Description
cur_mem_hold	The current MemStore usage.
trigger_mem_limit	The Memstore usage threshold that triggers throttling.
trigger_mem_remain	The remaining space of the MemStore that triggers throttling.
tenant_memory_remain	The current remaining space of the MemStore.
mem_overused	The size of overused space of the MemStore.

Lock contention

Search keyword

Write-write conflict: on_wlock_retry or lock_for_write conflict
Read-write conflict: lock_for_read retry or lock_for_read fail

Log description: records of row lock contention and retries.

[2022-09-03 11:33:06.624063] WARN  [STORAGE.TRANS] on_wlock_retry
(ob_memtable_context.cpp:345) [4198][0][XXXXXXXXXXXX-XXXXXXXXXXXXXXXXXX-X-X] [lt=16] [dc=0] lock_for_write conflict(*this=alloc_type=0 ctx_descriptor=45479 … abs_expired_time=1662179586523672 stmt_timeout=3599900482 … pkey={tid:1100611139453777, partition_id:0, part_cnt:0} trans_id={hash:16499054864673064020, inc:284227, addr:"xxx.xxx.xxx.xxx:2882", t:1662175982313772} checksum_log_ts=0, key=table_id=1100611139453777 rowkey_object=[{"NUMBER":"1"}] , conflict_ctx="alloc_type=0 ctx_descriptor=45354 … stmt_start_time=1662175972255438 abs_expired_time=1662179572155922 stmt_timeout=3599900484 … pkey={tid:1100611139453777, partition_id:0, part_cnt:0} trans_id={hash:15878544715862172232, inc:283677, addr:"xxx.xxx.xxx.xxx:2882", t:1662175972255438} checksum_log_ts=0")

Log interpretation

Log information	Description
*this	The waiter, namely, the transaction waiting for a lock.
conflict_ctx	The blocker, which can be the lock holder.
pkey	The table ID and partition ID.
rowkey_object	The row key.
trans_id	The transaction ID.

Slow queries

Search keyword: [slow query]

Log description: detailed SQL trace information.

[admin@OB /home/admin/oceanbase/log] $grep "slow query" observer.log

The return result is as follows:

[2022-02-18 11:49:32.557473] TRACE [TRACE]obmp_base.cpp:948 [1837144][3090][XXXXXXXXXX-XXXXXXXXXXXXXX] [lt=2] [dc=0] [slow query]
(TRACE=begin_ts=1645156047499895 2022-02-18 03:47:27.499895
|[process_begin] u=0 in_queue_time:16, receive_ts:1645156047499878, enqueue_ts:1645156047499879
|[query_begin] u=1 trace_id:XXXXXXXXXX-XXXXXXXXXXXXXX
|[session] u=4 sid:3221502589, tenant_id:1|
......
|[parse_begin] u=17 stmt:"select count(*) from TEST1 fetch first 1000 rows only", stmt_len:58
|[cache_get_plan_begin] u=3 |[cache_get_plan_end] u=18 
......
|[cg_begin] u=1 |[cg_end] u=137 
|[plan_id] u=96 plan_id:91455
|[exec_begin] u=2 arg1:false, end_trans_cb:false
|[do_open_plan_begin] u=2 plan_id:91455
......
|[result_set_close] u=0 ret:-5066, arg1:-5066, arg2:0, arg3:-5066, async:false
|[session] u=34 sid:3226669564, tenant_id:1034
|[exec_end] u=21 
|[query_end] u=125 
|[process_end] u=38 run_ts:1645156047499900
|total_timeu=125057508)

Log interpretation

Log information	Description
[eventbegin] and [eventend]	The start and end of internal events during the execution of the SQL statement.
u=xxx	The interval between two trace events.
in_queue_time	The queuing time before the SQL statement starts to be executed.
trace_id	The trace ID of the SQL statement.
sid	The ID of the session where the SQL statement is executed.
stmt	The SQL statement.
plan_id	The ID of the execution plan used for the SQL statement.
ret	The return code (error code).
total_timeu	The total time spent in executing the SQL statement.

Slow transactions

Search keyword: [slow trans]

Log description: the call link and time consumption information of the execution of the transaction.

[admin@OB /home/admin/oceanbase/log] $grep "slow trans" observer.log

The return result is as follows:

observer.log:[2021-10-08 11:45:44.313295] TRACE [TRACE]:0 [9382][2334][XXXXXXXXXX-XXXXXXXXXXXXXXXXX] [lt=23] [dc=39] [slow trans] (TRACE=begin_ts=1633664744167732 2021-10-08 03:45:44.167732 [init] u=0 arg1:140542173395392, ctx_type:"scheduler", trans_type:2, trans_id:{hash:7103234888693285108, inc:532846, addr:{ip:"192.168.1.1", port:40000}, t:1633664744167658}, pkey:{tid:1000, partition_id:1000, part_cnt:1000}, arg1:false, uref:1073741824 [set_stc] u=1 stc:1633664744167733 [start_trans] u=2 ret:0, uref:1073741825 [start_stmt] u=7 ret:0, tenant_id:1003, sql_no:4294967297, phy_plan_type:3, stmt_type:2, sql_id:"AA2155E24549E97", trace_id:Y9C4064586A6B-0005CDCF253FBE70, start_time:1633664744166256, uref:1073741825 [end_stmt] u=30832 ret:0, is_rollback:false, rollback_to:4294967297, rollback_from:4294967297, uref:1073741825 [start_stmt] u=50000 ret:0, tenant_id:1003, sql_no:4294967298, phy_plan_type:3, stmt_type:2, sql_id:"AA2155E24549E97", trace_id:Y9C4064586A6B-0005CDCF253FBE70, start_time:1633664744166256, uref:1073741825 [end_stmt] u=10 ret:0, is_rollback:false, rollback_to:4294967298, rollback_from:4294967298, uref:1073741825 [submit_commit] u=52 ret:0, uref:1073741824 [end_trans] u=35 ret:0, is_rollback:false, uref:1073741825 [exiting] u=114618 uref:1073741824, arg1:3221531139, arg2:7230647234361492284 [handle_message] u=1 ret:0, msg_type:1, uref:1 [end_trans_cb] u=13 ret:0, param:0, param:true [destroy] u=1 uref:0 total_timeu=195572)

Log interpretation

Log information	Description
ctx_type	The type of the transaction context. Valid values: `scheduler`, `coordinator`, and `participant`.
trans_type	The type of the transaction. Valid values: `0`: SP_TRANS (single partition) `1`: MINI_SP_TRANS (read-only single partition) `2`: DIST_TRANS (distributed)
trans_id	The ID of the transaction.
pkey	The table ID and partition ID of the object being modified in the current transaction context.
[start_trans]/[end_trans]	Information about the start or end of the transaction context.
[start_stmt]/[end_stmt]	Information about the start or end of the execution of the SQL statement, including the SQL ID and trace ID.
u=xxx	The interval between two trace events.
total_timeu	The total time spent in processing the transaction.

Slow network transmission

Search keyword: packet fly cost too much time

Log description: too much time spent by the RPC thread in sending the packet.

[admin@OB /home/admin/oceanbase/log] $grep "packet fly cost too much time" observer.log

The return result is as follows:

[2022-10-27 23:09:58.317505] WARN  [RPC.OBRPC] decode (ob_rpc_net_handler.cpp:119) [31273] [Y00000000000000000] [lt=8] packet fly cost too much time(pcode=2311, fly_ts=769195, send_timestamp=1666883397548305)

Log interpretation

Log information	Description
pcode	The processing event code. You can query `__all_virtual_obrpc_stat` for the meaning of the code.
fly_ts	The duration from when the source encodes and sends the packet to when the destination receives and decodes the packet.
send_timestamp	The timestamp when the source sends the packet.

Note

Slow RPC transmission may be caused by the network latency, clock offset, system load, and busy status of RPC threads.

Network throttling

Search keyword: network speed changed, reload_bandwidth_throttle_limit, or init_bandwidth_throttle

Log description: network throttling settings for system tasks on an OBServer node.

[admin@OB /home/admin/oceanbase/log] $grep "network speed changed|reload_bandwidth_throttle_limit|init_bandwidth_throttle" observer.log

The return result is as follows:

[2021-06-10 11:10:49.397019] INFO  [SERVER] ob_server.cpp:2018 [72580][0][Y0-0000000000000000-0-0] [lt=8] [dc=0] succeed to init_bandwidth_throttle(sys_bkgd_net_percentage_=60, network_speed=1310720000, rate=786432000):

[2021-06-10 11:14:44.905396] INFO  [SERVER] ob_server.cpp:2385 [72582][0][Y0-0000000000000000-0-0] [lt=11] [dc=0] network speed changed(from=1310720000, to=1048576000)

[2021-06-10 11:14:44.905421] INFO  [SERVER] ob_server.cpp:2055 [72582][0][Y0-0000000000000000-0-0] [lt=3] [dc=0] succeed to reload_bandwidth_throttle_limit(old_percentage=60, new_percentage=60, network_speed=1048576000, rate=629145600)

Log interpretation

Log information	Description
sys_bkgd_net_percentage_	The percentage of network bandwidth resources for background threads. The value is sourced from the system parameter `sys_bkgd_net_percentage`. By default, 60% of the bandwidth resources of the network interface card (NIC) are available for background threads.
network_speed	The NIC bandwidth, which is specified based on the related configuration file.
rate	The network bandwidth for background threads, which is the value of `network_speed` multiplied by `sys_bkgd_net_percentage`.

Log disk write failure

Search keyword: submit_log failed

Log description: information about log disk write failure.

[admin@OB /home/admin/oceanbase/log] $grep "submit_log failed" observer.log

The return result is as follows:

[2023-02-14 16:51:10.255801] ERROR [CLOG] submit_log (ob_partition_log_service.cpp:468) [893][0][Y0-0000000000000000-0-0] [lt=19] submit_log failed(ret=-4264, partition_key={tid:1099511627916, partition_id:0, part_cnt:0}, buff=0xfff80a4aef70, size=805, base_timestamp=1676065578191066, is_trans_related_log=true, role=1, state=4)

Log interpretation

Log information	Description
ret	The error code.
partition_key	The table ID and partition ID of the related object.

Errors requiring attention of DBAs

Search keyword: ERROR issue_dba_error

Log description: errors that require the attention of database administrators (DBAs).

[admin@OB /home/admin/oceanbase/log] $grep "ERROR issue_dba_error" observer.log

The return result is as follows:

2023-05-11 16:05:27.966611] ERROR issue_dba_error (ob_log.cpp:2322) [79165][0][Y0-0000000000000000-0-0] [lt=18] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=0, file="ob_log_file_group.cpp", line_no=412, info="clog disk is almost full")

[2023-05-11 16:45:58.135783] ERROR issue_dba_error (ob_log.cpp:2322) [77332][0][Y0-0000000000000000-0-0] [lt=8] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4264, file="ob_partition_log_service.cpp", line_no=468, info="submit_log failed")

[2023-05-11 16:45:59.448092] ERROR issue_dba_error (ob_log.cpp:2322) [77584][0][Y0-0000000000000000-0-0] [lt=10] [dc=0][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=0, file="ob_election.cpp", line_no=2077, info="leader_revoke, please attention!")

Log interpretation

Log information	Description
internal errcode	The error code.
file	The source code with the error.
info	The error information.

rootservice.log

Log keyword	Description
Rebalance task manager	The statistics of load balancing tasks.
[DDL] execute ddl like stmt	Information about an executed DDL statement.
choose leader info	Information about leader switching.
partition can't replicate	Information about a partition replication failure.
check merge progress success	The major compaction progress.
replica not merged	Information about a failed major compaction for a partition.

Statistical logs

Common diagnostic logs

Load balancing task statistics

Search keyword: Rebalance task manager statistics

Log description: statistics of load balancing tasks.

[admin@OB /home/admin/oceanbase/log] $grep "Rebalance task manager statistics" observer.log

The return result is as follows:

[2023-05-10 05:22:50.862554] INFO  [RS] ob_rebalance_task_mgr.cpp:1887 [65048][0][XXXXXXXXXX-XXXXXXXXXXXXXX-X-X] [lt=14] [dc=0] Rebalance task manager statistics(waiting_replication_task_cnt=0, executing_replication_task_cnt=0, waiting_migration_task_cnt=0, executing_migration_task_cnt=0)

Log interpretation

Log information	Description
waiting_replication_task_cnt	The number of pending replication tasks.
executing_replication_task_cnt	The number of ongoing replication tasks.
waiting_migration_task_cnt	The number of pending migration tasks.
executing_migration_task_cnt	The number of ongoing replication tasks.

Major compaction progress

Search keyword: check merge progress success or succ to finish zone merge

Log description: the major compaction progress.

[admin@OB /home/admin/oceanbase/log] $grep "check merge progress success" observer.log

[admin@OB /home/admin/oceanbase/log] $grep "succ to finish zone merge" observer.log

The return result is as follows:

[2021-04-07 10:37:16.451302] INFO  [RS] ob_daily_merge_scheduler.cpp:1840 [4532][XXXXXXXXXX-XXXXXXXXXXXXXX] [lt=34] succeed to add_finish_zone_merge_op(zone=bank.ea133_8, last_merged_version=10044, all_merged_version=10044)
[2021-04-07 10:37:18.114716] INFO  [RS] ob_daily_merge_scheduler.cpp:1986 [4532][XXXXXXXXXX-XXXXXXXXXXXXXX] [lt=14] succ to finish zone merge(zone=bank.ea133_8, last_merged_version=10044, all_merged_version=10044)

[2023-05-16 20:04:48.270903] INFO  [RS] ob_partition_table_util.cpp:317 [87321][0][XXXXXXXXXX-XXXXXXXXXXXXXX] [lt=10] [dc=0] check merge progress success(version=38, all_majority_merged=false, all_progress=[{zone:"zone1", unmerged_partition_cnt:2301, unmerged_data_size:126884638603, merged_partition_cnt:0, merged_data_size:0, smallest_data_version:37, merged_partition_percentage:0, merged_data_percentage:0}, {zone:"zone2", unmerged_partition_cnt:0, unmerged_data_size:0, merged_partition_cnt:2301, merged_data_size:126884950766, smallest_data_version:38, merged_partition_percentage:100, merged_data_percentage:100}, {zone:"zone3", unmerged_partition_cnt:2301, unmerged_data_size:126884638603, merged_partition_cnt:0, merged_data_size:0, smallest_data_version:37, merged_partition_percentage:0, merged_data_percentage:0}], checking_time_us=202635)

Log interpretation

Log information	Description
version	The current major compaction version.
all_majority_merged	Indicates whether the major compaction is completed for all zones. Valid values: `true` and `false`.
all_progress	The major compaction progress of each zone.

DDL execution

Search keyword: [DDL] execute ddl like stmt

Log description: information about an executed DDL statement.

[admin@OB /home/admin/oceanbase/log] $grep "[DDL] execute ddl like stmt" observer.log

The return result is as follows:

[2023-05-16 10:50:10.834911] INFO  [RS] ob_rs_rpc_processor.h:249 [87188][0][XXXXXXXXXX-XXXXXXXXXXXXXX] [lt=17] [dc=0] [DDL] execute ddl like stmt(ret=0, cost=44039, ddl_arg={ddl_stmt_str:"CREATE INDEX "IND_PA_ACCTSENIOR" on "T_PA_ACCTSENIOR" ( "FSENIOR_CODE" ) GLOBAL", exec_tenant_id:1005, ddl_id_str:"", is_replay_schema:false, based_schema_object_infos:[{schema_id:1105009185964905, schema_type:5, schema_version:1684205410682568}]})

Log interpretation

Log information	Description
ret	The return code of the DDL statement.
cost	The execution time.
ddl_arg	The DDL statement.
schema_type	The type of the object created.

Leader switching

Search keyword: choose leader info

Log description: information about leader switching performed for a partition.

[admin@OB /home/admin/oceanbase/log] $grep "choose leader info" observer.log

The return result is as follows:

[2023-05-10 17:41:28.833956] INFO  [RS] ob_leader_coordinator.cpp:5004 [65037][0][XXXXXXXXXX-XXXXXXXXXXXXXX-X-X] [lt=27] [dc=0] choose leader info(tenant_id=1005, tablegroup_id=18446744073709551615, server="192.168.1.1:2882", info={server_addr:"192.168.1.1:2882", zone:"zone2", balance_group_score:9223372036854775807, region_score:{region_:"beijing", region_score_:0}, not_merging:true, start_service:true, zone_candidate_count:1, candidate_count:1, is_candidate:true, not_excluded:true, migrate_out_or_transform_count:0, zone_migrate_out_or_transform_count:0, in_normal_unit_count:1, zone_score:{zone_:"zone2", zone_score_:0}, original_leader_count:0, random_score:3, cur_leader_count:1, in_revoke_blacklist_count:0, partition_id:0}, is_stopped=false, is_zone_active=true)

Log interpretation

Log information	Description
tablegroup_id	The ID of the partition or table group for which leader switching is performed.
server	The OBServer node where the new leader resides.
is_stopped	The status of the OBServer node where the new leader resides.
is_zone_active	The status of the zone where the new leader resides.

Major compaction failure of partitions

Search keyword: replica not merged

Log description: information about a failed major compaction for a partition.

[admin@OB /home/admin/oceanbase/log] $grep "replica not merged" observer.log

The return result is as follows:

[2021-04-07 10:47:36.835217] INFO  [RS] ob_partition_table_util.cpp:180 [4532][XXXXXXXXXX-XXXXXXXXXXXXXX] [lt=5] replica not merged to version(version=10044, replica={table_id:{value:1106108697592732, first:1006, second:50076}, partition_id:0, partition_cnt:1, zone:"bank.mub_2", server:"xxx.xxx.xxx.xxx:2882", sql_port:2881, unit_id:1017, role:1, member_list:[{server:"xxx.xxx.xxx.xxx:2882", timestamp:1557500921627412}, {server:"xxx.xxx.xxx.xxx:2882", timestamp:1560866621699092}, {server:"xxx.xxx.xxx.xxx:2882", timestamp:1592984233874976}, {server:"xxx.xxx.xxx.xxx:2882", timestamp:1614240480439532}, {server:"xxx.xxx.xxx.xxx:2882", timestamp:1614911835988424}], row_count:8635155074, data_size:2061390087129, data_version:10043, data_checksum:3763772143, row_checksum:row:[0], modify_time_us:1617652580558918, create_time_us:1560880959641988, member_time_us:1560866621699092, is_original_leader:false, in_member_list:true, rebuild:false, to_leader_time:1603047186742602, replica_status:"REPLICA_STATUS_NORMAL", replica_type:0, required_size:2295137828864, status:"REPLICA_STATUS_NORMAL", is_restore:0, partition_checksum:14438911093, quorum:5})

Log interpretation

Log information	Description
version	The global major compaction version.
table_id:{value:v0, first:v1, second:v2}	`v0` represents `table_id`, `v1` represents `tenant_id`, and `v2` represents `pure table_id`.
data_version	The major version of the current partition.
replica_type	The replica type. `0` indicates a full-featured replica, `5` indicates a log-only replica, and `16` indicates a read-only replica.
quorum	The number of Paxos replicas.

election.log

Log keyword	Description
leader lease is expired	Indicates that the leader lease has expired and a new leader needs to be elected.
leader takeover success	Indicates that a new leader is successfully elected.
leader revoke	Indicates that the leader is revoked.
current leader is invalid	A warning log indicating that the leader is invalid.
leader_revoke, please attention!	An error log indicating that the leader has been revoked.

Interpretation of leader revocation logs

Search keyword: leader revoke or leader_revoke

Log description: information about leader revocation.

[admin@OB /home/admin/oceanbase/log] $grep "leader revoke|leader_revoke" observer.log

The return result is as follows:

[2022-05-08 22:41:38.827689] INFO  [ELECT] ob_election.cpp:2072 [201777][0][Y0-0000000000000000-0-0] [lt=1526] take the initiative leader revoke(partition={tid:1100611139404082, partition_id:0, part_cnt:0}, revoke reason="clog disk full")
    
[2022-05-08 22:41:38.827723] ERROR [ELECT] leader_revoke (ob_election.cpp:2077) [201777][0][Y0-0000000000000000-0-0] [lt=8] leader_revoke, please attention!(revoke reason="clog disk full", election={partition:{tid:1100611139404082, partition_id:0, part_cnt:0}, is_running:true, is_changing_leader:false, self:"xxx.xxx.xxx.xxx:2882", proposal_leader:"0.0.0.0", cur_leader:"xxx.xxx.xxx.xxx:2882", curr_candidates:3{server:"xxx.xxx.xxx.xxx:2882", timestamp:1650642269999783, flag:0, need_encrypt:false}{server:"xxx.xxx.xxx.xxx:2882", timestamp:1651730035655503, flag:0, need_encrypt:false}{server:"xxx.xxx.xxx.xxx:2882", timestamp:1650642269999783, flag:0, need_encrypt:false}, curr_membership_version:1651730068793522, leader_lease:[1651816304570000, 1651816418400000], election_time_offset:5570000, active_timestamp:1651808137937449, T1_timestamp:1651816404000000, leader_epoch:1651816299000000, state:1, role:1, stage:-1, type:-1, replica_num:3, unconfirmed_leader:"xxx.xxx.xxx.xxx:2882", unconfirmed_leader_lease:[1651816304570000, 1651816419000000], takeover_t1_timestamp:1651816402600000, is_need_query:false, valid_candidates:0, change_leader_timestamp:0, ignore_log:false, leader_revoke_timestamp:1651816301770881, vote_period:4, lease_time:9800000, move_out_timestamp:1651816300405697, eg_part_array_idx:40, eg_id_hash:6130129687449640250}) BACKTRACE:0xf5c6965 0xf45eeca 0xf44866a 0x530ea4d 0x9bbea6b 0x9b2c627 0x9b2c0cc 0x9b77a60 0x987a46b 0xb472a62 0x93845b5 0x9384519 0x5260906 0x66daf5d 0xf33ee22 0xf33ec7e 0xf5da8ce

Log interpretation

Log information	Description
revoke reason	The reason why the leader revocation is triggered.
is_running	Indicates whether an election is in progress.
is_changing_leader	Indicates whether the leader is being changed.
self	The current node.
proposal_leader	The proposed leader.
cur_leader	The current leader.
curr_candidates	The list of candidate nodes.
leader_lease	The start time and end time of the leader lease.
unconfirmed_leader	The unconfirmed leader node.
leader_revoke_timestamp	The time when the leader revocation occurs.