Scenarios
You can use the obdiag rca run command to analyze the clog_disk_full scenario based on internal table data of OceanBase Database.
OceanBase Database V4.0.0.0 and later are supported for this feature. When you use obdiag commands, you must configure cluster information in the ~/.obdiag/config.yml file or by specifying --config options in the commands.
Notice
Internal data is updated in real time. Therefore, obdiag only supports the analysis of current data.
Supported environment variables
| Variable | Required | Data type | Default value | Description |
|---|---|---|---|---|
| since | No | string | "30m" | The time range of logs to analyze. |
Examples
Specify the scenario in the command.
obdiag rca run --scene=clog_disk_full
Results
The following command output shows that log stream 1005 was replayed. You can further trace the issue.
+-------------------------------------------------------------------------------------------------------------+
| record |
+------+------------------------------------------------------------------------------------------------------+
| step | info |
+------+------------------------------------------------------------------------------------------------------+
| 1 | check error tenant_ls_data is {'tenant_id': 1xxx, 'ls_id': 1005, 'ip': 'xxx.xxx.xxx.xxx', 'port': |
| | 2882} |
| 2 | __check_checkpoint |
| 3 | find log_disk_full about checkpoint in |
| | ./rca//obdiag_clog_disk_full_20240719104714/tenant_id_1xxx/ls_id_1005/checkpoint/ |
| 4 | is_clog_checkpoint_stuck is True |
| 5 | the log is [2024-07-19 10:45:08.043682] INFO [STORAGE] update_clog_checkpoint |
| | (ob_checkpoint_executor.cpp:158) [52704][T1xxx_CKClogDis][T1xxx][Y0-0000000000000000-0-0] [lt=0] |
| | [CHECKPOINT] clog checkpoint no change(checkpoint_scn={val:1720543850356918166, v:0}, |
| | checkpoint_scn_in_ls_meta={val:1720543850356918166, v:0}, ls_id={id:1005}, |
| | service_type="TRANS_SERVICE") |
| 6 | stuck_service_type is TRANS_SERVICE |
| 7 | __check_checkpoint end |
| 8 | __get_min_ckpt_type start |
| 9 | get min ckpt type is on [2024-07-19 10:45:59.748107] INFO [STORAGE.TRANS] get_rec_scn |
| | (ob_ls_tx_service.cpp:507) [52704][T1xxx_CKClogDis][T1xxx][Y0-0000000000000000-0-0] [lt=0] |
| | [CHECKPOINT] ObLSTxService::get_rec_scn(common_checkpoint_type="MDS_TABLE_TYPE", |
| | common_checkpoints_[min_rec_scn_common_checkpoint_type_index]={this:0x7f4a69ac8510, is_inited:true, |
| | freezing_scn:{val:1720543850382118208, v:0}, ls:{ls_meta:{tenant_id:1xxx, ls_id:{id:1005}, |
| | ls_create_status:1, clog_checkpoint_scn:{val:1720543850356918166, v:0}, |
| | clog_base_lsn:{lsn:2367456215040}, rebuild_seq:1, migration_status:0, gc_state_:1, |
| | offline_scn_:{val:18446744073709551615, v:3}, restore_status:{status:8}, |
| | replayable_point:{val:1720662756177146368, v:0}, |
| | tablet_change_checkpoint_scn:{val:1720543765175444272, v:0}, all_id_meta:{id_meta:[{limited_id:1, |
| | latest_log_ts:{val:18446744073709551615, v:3}}, {limited_id:1, |
| | latest_log_ts:{val:18446744073709551615, v:3}}, {limited_id:1, |
| | latest_log_ts:{val:18446744073709551615, v:3}}]}, transfer_scn:{val:1720536600653472428, v:0}, |
| | rebuild_info:{status:{status:0}, type:{type:0}}}, switch_epoch:1, log_handler:{role:2, |
| | proposal_id:17, palf_env_:0x7f1de657c030, is_in_stop_state_:false, is_inited_:true, id_:1005}, |
| | restore_handler:{is_inited:true, is_in_stop_state:false, id:1005, proposal_id:9223372036854775807, |
| | role:2, parent:null, context:{issue_task_num:0, issue_version:-1, last_fetch_ts:-1, |
| | max_submit_lsn:{lsn:18446744073709551615}, max_fetch_lsn:{lsn:18446744073709551615}, |
| | max_fetch_scn:{val:18446744073709551615, v:3}, error_context:{ret_code:0, |
| | trace_id:Y0-0000000000000000-0-0, error_type:1048580, err_lsn:{lsn:18446744073709551615}}, |
| | task_count:0}, restore_context:{seek_done:false, lsn:{lsn:18446744073709551615}}}, is_inited:true, |
| | tablet_gc_handler:{tablet_persist_trigger:2, is_inited:true}, startup_transfer_info:{ls_id:{id:0}, |
| | transfer_start_scn:{val:18446744073709551615, v:3}}}}, min_rec_scn={val:1720543850356918166, v:0}, |
| | ls_id_={id:1005}) |
| 10 | min_checkpoint_tx_log_type is MDS_TABLE_TYPE |
| 11 | min_checkpoint_scn is 1720543850356918166 |
| 12 | check_min_ckpt_type is True |
| 13 | __get_min_ckpt_type end |
| 14 | __check_replay_stuck start |
| 15 | check_replay_stuck is True. the line: [2024-07-19 10:47:33.367691] INFO [CLOG] |
| | get_min_unreplayed_log_info (ob_replay_status.cpp:984) |
| | [4600][T1xxx_TenantWea][T1xxx][Y0-0000000000000000-0-0] [lt=14] |
| | get_min_unreplayed_log_info(lsn={lsn:2367477760943}, scn={val:1720543850382118209, v:0}, |
| | this={ls_id_:{id:1005}, is_enabled_:true, is_submit_blocked_:false, role_:2, |
| | err_info_:{lsn_:{lsn:18446744073709551615}, scn_:{val:0, v:0}, log_type_:0, is_submit_err_:false, |
| | err_ts_:0, err_ret_:0}, ref_cnt_:4, post_barrier_lsn_:{lsn:2367477760943}, pending_task_count_:1, |
| | submit_log_task_:{ObReplayServiceSubmitTask:{type_:1, enqueue_ts_:1721357253367659, |
| | err_info_:{has_fatal_error_:false, fail_ts_:1721213879495661, fail_cost_:176528190, |
| | ret_code_:-4023}}, next_to_submit_lsn_:{lsn:2367477763353}, |
| | next_to_submit_scn_:{val:1720543850382118210, v:0}, base_lsn_:{lsn:2332293316608}, |
| | base_scn_:{val:1720468881546872775, v:0}, iterator_:{iterator_impl:{buf_:0x7f00f0805000, |
| | next_round_pread_size:2121728, curr_read_pos:387999, curr_read_buf_start_pos:0, |
| | curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:2367477375410}, |
| | end_lsn:{lsn:2367479497138}, read_buf:{buf_len_:2125824, buf_:0x7f00f0805000}, block_size:67104768, |
| | log_storage_:0x7f2b1a5bf2f0}, IteratorStorageType::"DiskIteratorStorage"}, |
| | curr_entry_is_raw_write:true, curr_entry_size:99, prev_entry_scn:{val:1720543850382118210, v:0}, |
| | curr_entry:{LogEntryHeader:{magic:19528, version:1, log_size:67, scn_:{val:1720543850382118210, |
| | v:0}, data_checksum:927461962, flag:1}}, init_mode_version:14, accumulate_checksum:1210730574, |
| | curr_entry_is_padding:0, padding_entry_size:99, padding_entry_scn:{val:1720543850382118210, |
| | v:0}}}}}) |
| 16 | get min unreplayed log info is [2024-07-19 10:47:33.367691] INFO [CLOG] get_min_unreplayed_log_info |
| | (ob_replay_status.cpp:984) [4600][T1xxx_TenantWea][T1xxx][Y0-0000000000000000-0-0] [lt=14] |
| | get_min_unreplayed_log_info(lsn={lsn:2367477760943}, scn={val:1720543850382118209, v:0}, |
| | this={ls_id_:{id:1005}, is_enabled_:true, is_submit_blocked_:false, role_:2, |
| | err_info_:{lsn_:{lsn:18446744073709551615}, scn_:{val:0, v:0}, log_type_:0, is_submit_err_:false, |
| | err_ts_:0, err_ret_:0}, ref_cnt_:4, post_barrier_lsn_:{lsn:2367477760943}, pending_task_count_:1, |
| | submit_log_task_:{ObReplayServiceSubmitTask:{type_:1, enqueue_ts_:1721357253367659, |
| | err_info_:{has_fatal_error_:false, fail_ts_:1721213879495661, fail_cost_:176528190, |
| | ret_code_:-4023}}, next_to_submit_lsn_:{lsn:2367477763353}, |
| | next_to_submit_scn_:{val:1720543850382118210, v:0}, base_lsn_:{lsn:2332293316608}, |
| | base_scn_:{val:1720468881546872775, v:0}, iterator_:{iterator_impl:{buf_:0x7f00f0805000, |
| | next_round_pread_size:2121728, curr_read_pos:387999, curr_read_buf_start_pos:0, |
| | curr_read_buf_end_pos:2121728, log_storage_:{IteratorStorage:{start_lsn:{lsn:2367477375410}, |
| | end_lsn:{lsn:2367479497138}, read_buf:{buf_len_:2125824, buf_:0x7f00f0805000}, block_size:67104768, |
| | log_storage_:0x7f2b1a5bf2f0}, IteratorStorageType::"DiskIteratorStorage"}, |
| | curr_entry_is_raw_write:true, curr_entry_size:99, prev_entry_scn:{val:1720543850382118210, v:0}, |
| | curr_entry:{LogEntryHeader:{magic:19528, version:1, log_size:67, scn_:{val:1720543850382118210, |
| | v:0}, data_checksum:927461962, flag:1}}, init_mode_version:14, accumulate_checksum:1210730574, |
| | curr_entry_is_padding:0, padding_entry_size:99, padding_entry_scn:{val:1720543850382118210, |
| | v:0}}}}}) |
| 17 | log_time - replay_scn_time : 2024-07-19 10:47:33.367691 - 2024-07-10 00:50:50.382118 |
| 18 | datetime.timedelta(minutes=0.5): 0:00:30 |
| 19 | log_time - replay_scn_time > datetime.timedelta(minutes=0.5) is True |
| 20 | log_time:{0}, replay_scn_time:{1} |
| 21 | check_replay_stuck is True |
| 22 | check_replay_stuck is True. Please check replay status |
| 23 | __check_replay_stuck end |
| 24 | __check_dump_stuck start |
| 25 | check_dump_stuck is False |
| 26 | __check_dump_stuck end |
| 27 | __check_data_disk_full start |
| 28 | check_data_disk_full is False |
| 29 | __check_data_disk_full end |
| 30 | __check_too_many_sstable start |
| 31 | check_too_many_sstable is False |
| 32 | __check_too_many_sstable end |
| 33 | check end |
+------+------------------------------------------------------------------------------------------------------+
The suggest: min_checkpoint_tx_log_type is MDS_TABLE_TYPE. please check it.
References
OceanBase Database Diagnostics and Turning (12) - clog_disk_full Error