Applicable scenarios
This article is applicable to the scenario where the clog disk is full and relies on internal table data for analysis.
Currently, OceanBase database V4.0.0.0 and later versions are supported. When using obdiag, you need to configure the cluster information in the ~/.obdiag/config.yml file according to your choice, or configure the cluster information through the --config option in the command.
Attention
Due to the timeliness of internal data, only current data is supported for analysis.
Can import environment variables
Variable name |
Is it required |
Data type |
Default value |
Description |
|---|---|---|---|---|
| since | No | string | "30m" | Specify the time period of the logs to be analyzed. The default is 30m, which means analyzing the logs within the past 30 minutes. |
Usage example
Configure the analysis scenario. The default execution example is as follows:
obdiag rca run --scene=clog_disk_full
Example of results
According to the following output, it can be found that the 1005 log stream in the node has a playback problem and requires further tracking.
+-------------------------------------------------------------------------------------------------------------+
| record |
+------+------------------------------------------------------------------------------------------------------+
| step | info |
+------+------------------------------------------------------------------------------------------------------+
| 1 | check error tenant_ls_data is {'tenant_id': 1xxx, 'ls_id': 1005, 'ip': 'xxx.xxx.xxx.xxx', 'port': |
| | 2882} |
| 2 | __check_checkpoint |
| 3 | find log_disk_full about checkpoint in |
| | ./rca//obdiag_clog_disk_full_20240719104714/tenant_id_1xxx/ls_id_1005/checkpoint/ |
| 4 | is_clog_checkpoint_stuck is True |
| 5 | the log is [2024-07-19 10:45:08.043682] INFO [STORAGE] update_clog_checkpoint |
| | (ob_checkpoint_executor.cpp:158) [52704][T1xxx_CKClogDis][T1xxx][Y0-0000000000000000-0-0] [lt=0] |
| | [CHECKPOINT] clog checkpoint no change(checkpoint_scn={val:1720543850356918166, v:0}, |
| | checkpoint_scn_in_ls_meta={val:1720543850356918166, v:0}, ls_id={id:1005}, |
| | service_type="TRANS_SERVICE") |
| 6 | stuck_service_type is TRANS_SERVICE |
| 7 | __check_checkpoint end |
| 8 | __get_min_ckpt_type start |
| 9 | get min ckpt type is on [2024-07-19 10:45:59.748107] INFO [STORAGE.TRANS] get_rec_scn |
| | (ob_ls_tx_service.cpp:507) [52704][T1xxx_CKClogDis][T1xxx][Y0-0000000000000000-0-0] [lt=0] |
| | [CHECKPOINT] ObLSTxService::get_rec_scn(common_checkpoint_type="MDS_TABLE_TYPE", |
| | common_checkpoints_[min_rec_scn_common_checkpoint_type_index]={this:0x7f4a69ac8510, is_inited:true, |
| | freezing_scn:{val:1720543850382118208, v:0}, ls:{ls_meta:{tenant_id:1xxx, ls_id:{id:1005}, |
| | ls_create_status:1, clog_checkpoint_scn:{val:1720543850356918166, v:0}, |
| | clog_base_lsn:{lsn:2367456215040}, rebuild_seq:1, migration_status:0, gc_state_:1, |
| | offline_scn_:{val:18446744073709551615, v:3}, restore_status:{status:8}, |
| | replayable_point:{val:1720662756177146368, v:0}, |
| | tablet_change_checkpoint_scn:{val:1720543765175444272, v:0}, all_id_meta:{id_meta:[{limited_id:1, |
| | latest_log_ts:{val:18446744073709551615, v:3}}, {limited_id:1, |
| | latest_log_ts:{val:18446744073709551615, v:3}}, {limited_id:1, |
| | latest_log_ts:{val:18446744073709551615, v:3}}]}, transfer_scn:{val:1720536600653472428, v:0}, |
| | rebuild_info:{status:{status:0}, type:{type:0}}}, switch_epoch:1, log_handler:{role:2, |
| | proposal_id:17, palf_env_:0x7f1de657c030, is_in_stop_state_:false, is_inited_:true, id_:1005}, |
| | restore_handler:{is_inited:true, is_in_stop_state:false, id:1005, proposal_id:9223372036854775807, |
| | role:2, parent:null, context:{issue_task_num:0, issue_version:-1, last_fetch_ts:-1, |
| | max_submit_lsn:{lsn:18446744073709551615}, max_fetch_lsn:{lsn:18446744073709551615}, |
| | max_fetch_scn:{val:18446744073709551615, v:3}, error_context:{ret_code:0, |
| | trace_id:Y0-0000000000000000-0-0, error_type:1048580, err_lsn:{lsn:18446744073709551615}}, |
| | task_count:0}, restore_context:{seek_done:false, lsn:{lsn:18446744073709551615}}}, is_inited:true, |
| | tablet_gc_handler:{tablet_persist_trigger:2, is_inited:true}, startup_transfer_info:{ls_id:{id:0}, |
| | transfer_start_scn:{val:18446744073709551615, v:3}}}}, min_rec_scn={val:1720543850356918166, v:0}, |
| | ls_id_={id:1005}) |
| 10 | min_checkpoint_tx_log_type is MDS_TABLE_TYPE |
| 11 | min_checkpoint_scn is 1720543850356918166 |
| 12 | check_min_ckpt_type is True |
| 13 | __get_min_ckpt_type end |
| 14 | __check_replay_stuck start |
| 15 | check_replay_stuck is True. the line: [2024-07-19 10:47:33.367691] INFO [CLOG] |
| | get_min_unreplayed_log_info (ob_replay_status.cpp:984) |
| | [4600][T1xxx_TenantWea][T1xxx][Y0-0000000000000000-0-0] [lt=14] |
| | get_min_unreplayed_log_info(lsn={lsn:2367477760943}, scn={val:1720543850382118209, v:0}, |
| | this={ls_id_:{id:1005}, is_enabled_:true, is_submit_blocked_:false, role_:2, |
| | err_info_:{lsn_:{lsn:18446744073709551615}, scn_:{val:0, v:0}, log_type_:0, is_submit_err_:false, |
| | err_ts_:0, err_ret_:0}, ref_cnt_:4, post_barrier_lsn_:{lsn:2367477760943}, pending_task_count_:1, |
| | submit_log_task_:{ObReplayServiceSubmitTask:{type_:1, enqueue_ts_:1721357253367659, |
| | err_info_:{has_fatal_error_:false, fail_ts_:1721213879495661, fail_cost_:176528190, |
| | ret_code_:-4023}}, next_to_submit_lsn_:{lsn:2367477763353}, |
| | next_to_submit_scn_:{val:1720543850382118210, v:0}, base_lsn_:{lsn:2332293316608}, |
| | base_scn_:{val:1720468881546872775, v:0}, iterator_:{iterator_impl:{buf_:0x7f00f0805000, |
| | next_round_pread_size:2121728, curr_read_pos:387999, curr_read_buf_start_pos:0, |
| | curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:2367477375410}, |
| | end_lsn:{lsn:2367479497138}, read_buf:{buf_len_:2125824, buf_:0x7f00f0805000}, block_size:67104768, |
| | log_storage_:0x7f2b1a5bf2f0}, IteratorStorageType::"DiskIteratorStorage"}, |
| | curr_entry_is_raw_write:true, curr_entry_size:99, prev_entry_scn:{val:1720543850382118210, v:0}, |
| | curr_entry:{LogEntryHeader:{magic:19528, version:1, log_size:67, scn_:{val:1720543850382118210, |
| | v:0}, data_checksum:927461962, flag:1}}, init_mode_version:14, accumulate_checksum:1210730574, |
| | curr_entry_is_padding:0, padding_entry_size:99, padding_entry_scn:{val:1720543850382118210, |
| | v:0}}}}}) |
| 16 | get min unreplayed log info is [2024-07-19 10:47:33.367691] INFO [CLOG] get_min_unreplayed_log_info |
| | (ob_replay_status.cpp:984) [4600][T1xxx_TenantWea][T1xxx][Y0-0000000000000000-0-0] [lt=14] |
| | get_min_unreplayed_log_info(lsn={lsn:2367477760943}, scn={val:1720543850382118209, v:0}, |
| | this={ls_id_:{id:1005}, is_enabled_:true, is_submit_blocked_:false, role_:2, |
| | err_info_:{lsn_:{lsn:18446744073709551615}, scn_:{val:0, v:0}, log_type_:0, is_submit_err_:false, |
| | err_ts_:0, err_ret_:0}, ref_cnt_:4, post_barrier_lsn_:{lsn:2367477760943}, pending_task_count_:1, |
| | submit_log_task_:{ObReplayServiceSubmitTask:{type_:1, enqueue_ts_:1721357253367659, |
| | err_info_:{has_fatal_error_:false, fail_ts_:1721213879495661, fail_cost_:176528190, |
| | ret_code_:-4023}}, next_to_submit_lsn_:{lsn:2367477763353}, |
| | next_to_submit_scn_:{val:1720543850382118210, v:0}, base_lsn_:{lsn:2332293316608}, |
| | base_scn_:{val:1720468881546872775, v:0}, iterator_:{iterator_impl:{buf_:0x7f00f0805000, |
| | next_round_pread_size:2121728, curr_read_pos:387999, curr_read_buf_start_pos:0, |
| | curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:2367477375410}, |
| | end_lsn:{lsn:2367479497138}, read_buf:{buf_len_:2125824, buf_:0x7f00f0805000}, block_size:67104768, |
| | log_storage_:0x7f2b1a5bf2f0}, IteratorStorageType::"DiskIteratorStorage"}, |
| | curr_entry_is_raw_write:true, curr_entry_size:99, prev_entry_scn:{val:1720543850382118210, v:0}, |
| | curr_entry:{LogEntryHeader:{magic:19528, version:1, log_size:67, scn_:{val:1720543850382118210, |
| | v:0}, data_checksum:927461962, flag:1}}, init_mode_version:14, accumulate_checksum:1210730574, |
| | curr_entry_is_padding:0, padding_entry_size:99, padding_entry_scn:{val:1720543850382118210, |
| | v:0}}}}}) |
| 17 | log_time - replay_scn_time : 2024-07-19 10:47:33.367691 - 2024-07-10 00:50:50.382118 |
| 18 | datetime.timedelta(minutes=0.5): 0:00:30 |
| 19 | log_time - replay_scn_time > datetime.timedelta(minutes=0.5) is True |
| 20 | log_time:{0}, replay_scn_time:{1} |
| 21 | check_replay_stuck is True |
| 22 | check_replay_stuck is True. Please check replay status |
| 23 | __check_replay_stuck end |
| 24 | __check_dump_stuck start |
| 25 | check_dump_stuck is False |
| 26 | __check_dump_stuck end |
| 27 | __check_data_disk_full start |
| 28 | check_data_disk_full is False |
| 29 | __check_data_disk_full end |
| 30 | __check_too_many_sstable start |
| 31 | check_too_many_sstable is False |
| 32 | __check_too_many_sstable end |
| 33 | check end |
+------+------------------------------------------------------------------------------------------------------+
The suggest: min_checkpoint_tx_log_type is MDS_TABLE_TYPE. please check it.
