This topic describes how to collect information and diagnose an OceanBase cluster by using the obd command.
obd integrates the OceanBase Diagnostic Tool (obdiag). obdiag can collect information such as logs, SQL audit records, and process stacks of an OceanBase cluster. You can use obd to collect diagnostic information of an OceanBase cluster in one click. For more information about obdiag, see obdiag GitHub repository.
Note
Before you collect information or diagnose an OceanBase cluster by using the obd command, make sure that the cluster is managed by obd. Otherwise, obd cannot obtain the information of the cluster. By default, an OceanBase cluster deployed by using obd is managed by obd. You can take over an OceanBase cluster deployed without obd. For more information, see Take over a cluster by using obd.
Prerequisites
Before you use the obdiag command, you must run the obd tool install obdiag command to install obdiag.
Note
Different obd obdiag commands have different requirements on the OceanBase Database and ODP versions. If the versions are not supported, logs will be output during usage.
Examples
Collect diagnostic information
You can use the obd obdiag gather command to collect information about an OceanBase cluster.
Here is an example of collecting logs from an OceanBase cluster:
obd obdiag gather log obtest --scope observer --from "2024-08-13 16:25:00" --to "2024-08-13 18:30:00" --grep STORAGE
The output is as follows:
ZipFileInfo:
+--------+-----------+
| Node | LogSize |
+========+===========+
| local | 47.374M |
+--------+-----------+
Gather Ob Log Summary:
+------------+-----------+---------+--------+------------------------------------------------------------------------------------+
| Node | Status | Size | Time | PackPath |
+============+===========+=========+========+====================================================================================+
| 10.10.10.1 | Completed | 47.374M | 6 s | ./obdiag_gather_pack_20240814141437/ob_log_local_20240813162500_20240813183000.zip |
+------------+-----------+---------+--------+------------------------------------------------------------------------------------+
Trace ID: 7b9417f2-5a04-11ef-823c-00163e0808cc
If you want to view detailed obdiag logs, please run: /home/admin/oceanbase-diagnostic-tool/obdiag display-trace 7b9417f2-5a04-11ef-823c-00163e0808cc
Trace ID: 79ad87c0-5a04-11ef-b74a-00163e0808cc
If you want to view detailed obd logs, please run: obd display-trace 79ad87c0-5a04-11ef-b74a-00163e0808cc
Analyze logs
You can use obdiag to analyze OceanBase database logs or trace logs. For more information, see the following sections.
In online mode, execute the following command:
obd obdiag analyze log obtest --from "2024-08-13 16:25:00" --to "2024-08-13 18:30:00"
In offline mode, execute the following command and specify the OceanBase database log path using the --files option:
obd obdiag analyze log obtest --files observer/log/
The output is as follows. You can copy and execute the cat command or the obdiag display-trace command in the output to view the detailed records.
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| Node | Status | FileName | ErrorCode | Message | Count |
+================+===========+==============================================================================+=============+===============================================================================================================================+=========+
| xx.xx.xx.xx | Completed | obdiag_analyze_pack_20240814143145/xx_xx_xx_xx/observer.log.20240814104204260 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 2 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx | Completed | obdiag_analyze_pack_20240814143145/xx_xx_xx_xx/observer.log.20240814111305072 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 8 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx | Completed | obdiag_analyze_pack_20240814143145/xx_xx_xx_xx/observer.log.20240814114410668 | -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use | 10 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx | Completed | obdiag_analyze_pack_20240814143145/xx_xx_xx_xx/observer.log.20240814114410668 | -4009 | IO error | 20 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
For more details, please run cmd ' cat /home/admin/obdiag_analyze_pack_20240814143145/result_details.txt '
Trace ID: e0809166-5a06-11ef-84bb-00163e0808cc
If you want to view detailed obdiag logs, please run: /home/admin/oceanbase-diagnostic-tool/obdiag display-trace e0809166-5a06-11ef-84bb-00163e0808cc
Trace ID: de972874-5a06-11ef-8502-00163e0808cc
If you want to view detailed obd logs, please run: obd display-trace de972874-5a06-11ef-8502-00163e0808cc
Obtain the FLT_TRACE_ID.
If you have a specific SQL statement, you can use
QUERY_SQLto find the FLT_TRACE_ID of the suspected slow SQL statement. Log in to the OceanBase database and execute the following command to view the FLT_TRACE_ID.select QUERY_SQL,FLT_TRACE_ID from oceanbase.GV$OB_SQL_AUDIT where QUERY_SQL like 'select @@version_comment limit 1';The output is as follows, where the FLT_TRACE_ID is
00060aa3-d607-f5f2-328b-388e17f687cb.+----------------------------------+--------------------------------------+ | QUERY_SQL | FLT_TRACE_ID | +----------------------------------+--------------------------------------+ | select @@version_comment limit 1 | 00060aa3-d607-f5f2-328b-388e17f687cb | +----------------------------------+--------------------------------------+You can also find the flt_trace_id in the trace.log file of ODP or the OceanBase database. Here is an example, where
00060bec-275e-9832-e730-7c129f2182acis the flt_trace_id.head trace.log [2023-12-07 22:20:07.242229] [489640][T1_L0_G0][T1][YF2A0BA2DA7E-00060BEC28627BEF-0-0] {"trace_id":"00060bec-275e-9832-e730-7c129f2182ac","name":"close_das_task","id":"00060bec-2a20-bf9e-56c9-724cb467f859","start_ts":1701958807240606,"end_ts":1701958807240607,"parent_id":"00060bec-2a20-bb5f-e03a-5da01aa3308b","is_follow":false}Execute the diagnostics.
obd obdiag analyze flt_trace obtest --flt_trace_id 000605b1-28bb-c15f-8ba0-1206bcc08aa3The output is as follows. You can copy and execute the
catcommand or theobdiag display-tracecommand in the output to view the detailed records.TOP time-consuming leaf span: +---+----------------------------------+-------------+---------------------+ | ID| Span Name | Elapsed Time| NODE | +---+----------------------------------+-------------+---------------------+ | 18| px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)| | 5 | pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)| | 16| do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.1)| | 10| do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)| | 17| close_das_task | 14 μs | OBSERVER(xx.xx.xx.1)| +---+----------------------------------+-------------+---------------------+ Tags & Logs: ------------------------------------- 18 - px_task Elapsed: 2.758 ms NODE:OBSERVER(xx.xx.xx.1) tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 1}] 5 - pc_get_plan Elapsed: 52 μs NODE:OBSERVER(xx.xx.xx.1) 16 - do_local_das_task Elapsed: 45 μs NODE:OBSERVER(xx.xx.xx.3) 10 - do_local_das_task Elapsed: 17 μs NODE:OBSERVER(xx.xx.xx.1) 17 - close_das_task Elapsed: 14 μs NODE:OBSERVER(xx.xx.xx.3) Details: +---+----------------------------------+-------------+---------------------+ | ID| Span Name | Elapsed Time| NODE | +---+----------------------------------+-------------+---------------------+ | 1 | TRACE | - | - | | 2 | └─com_query_process | 5.351 ms | OBPROXY(xx.xx.xx.1) | | 3 | └─mpquery_single_stmt | 5.333 ms | OBSERVER(xx.xx.xx.1)| | 4 | ├─sql_compile | 107 μs | OBSERVER(xx.xx.xx.1)| | 5 | │ └─pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)| | 6 | └─sql_execute | 5.147 ms | OBSERVER(xx.xx.xx.1)| | 7 | ├─open | 87 μs | OBSERVER(xx.xx.xx.1)| | 8 | ├─response_result | 4.945 ms | OBSERVER(xx.xx.xx.1)| | 9 | │ ├─px_schedule | 2.465 ms | OBSERVER(xx.xx.xx.1)| | 10| │ │ ├─do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)| | 11| │ │ ├─px_task | 2.339 ms | OBSERVER(xx.xx.xx.2)| | 12| │ │ │ ├─do_local_das_task | 54 μs | OBSERVER(xx.xx.xx.2)| | 13| │ │ │ └─close_das_task | 22 μs | OBSERVER(xx.xx.xx.2)| | 14| │ │ ├─do_local_das_task | 11 μs | OBSERVER(xx.xx.xx.1)| | 15| │ │ ├─px_task | 2.834 ms | OBSERVER(xx.xx.xx.3)| | 16| │ │ │ ├─do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.3)| | 17| │ │ │ └─close_das_task | 14 μs | OBSERVER(xx.xx.xx.3)| | 18| │ │ └─px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)| | 19| │ ├─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)| | 20| │ └─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)| | ..| ...... | ... | ...... | +---+----------------------------------+-------------+---------------------+ For more details, please run cmd ' cat /home/admin/obdiag_analyze_flt_result/000605b1-28bb-c15f-8ba0-1206bcc08aa3.txt ' Trace ID: c7534902-5a0d-11ef-95db-00163e0808cc If you want to view detailed obdiag logs, please run: /home/admin/oceanbase-diagnostic-tool/obdiag display-trace c7534902-5a0d-11ef-95db-00163e0808cc Trace ID: c56997a4-5a0d-11ef-84d6-00163e0808cc If you want to view detailed obd logs, please run: obd display-trace c56997a4-5a0d-11ef-84d6-00163e0808cc
Cluster inspection
You can use the obd obdiag check command to inspect the status of an OceanBase Database cluster. Currently, you can analyze an OceanBase cluster from system kernel parameters and internal tables. The inspection identifies existing or potential causes of cluster anomalies and provides O&M suggestions.
You can specify tasks in the command by using the --cases option. You can run the obd obdiag check list command to view the supported tasks.
Note
You can manually write tasks. For more information, see the OceanBase Agile Diagnostics Tool (obdiag) documentation on the One-click Cluster Inspection page. For more information about how to write tasks, see the Task Writing Tutorial section.
obd obdiag check list obtest
The following table lists the supported tasks in obdiag V2.3.0:
[check cases about obproxy]:
-----------------------------------------------------------------------------------------------
command info_en info_cn
-----------------------------------------------------------------------------------------------
obdiag check default check all task without filter Default inspection tasks
obdiag check --obproxy_cases=proxy obproxy version check obproxy version check
-----------------------------------------------------------------------------------------------
[check cases about observer]:
-----------------------------------------------------------------------------------------------------------------------
command info_en info_cn
-----------------------------------------------------------------------------------------------------------------------
obdiag check Default inspection tasks Default inspection tasks
obdiag check --cases=ad Test and inspection tasks Test and inspection tasks
obdiag check --cases=column_storage_poc column storage poc column storage poc
obdiag check --cases=build_before Deployment environment check Deployment environment check
obdiag check --cases=sysbench_run Collection of inspection tasks when executing sysbench Collection of inspection tasks when executing sysbench
obdiag check --cases=sysbench_free Collection of inspection tasks before executing sysbench Collection of inspection tasks before executing sysbench
-----------------------------------------------------------------------------------------------------------------------
Trace ID: ab2c078e-5aab-11ef-81b3-00163e0808cc
If you want to view detailed obdiag logs, please run: /home/admin/oceanbase-diagnostic-tool/obdiag display-trace ab2c078e-5aab-11ef-81b3-00163e0808cc
If you do not specify the --cases option, all tasks are executed by default. Here is an example:
obd obdiag check run obtest
The output is as follows:
Check obproxy finished. For more details, please run cmd ' cat ./check_report//obdiag_check_report_obproxy_2024-08-14-15-43-59.table '
Check observer finished. For more details, please run cmd' cat ./check_report//obdiag_check_report_observer_2024-08-14-15-43-59.table '
Trace ID: f795bcd2-5a10-11ef-90e6-00163e0808cc
If you want to view detailed obdiag logs, please run: /home/admin/oceanbase-diagnostic-tool/obdiag display-trace f795bcd2-5a10-11ef-90e6-00163e0808cc
Trace ID: f59531d8-5a10-11ef-89e9-00163e0808cc
If you want to view detailed obd logs, please run: obd display-trace f59531d8-5a10-11ef-89e9-00163e0808cc
You can copy the cat command or obdiag display-trace command in the output to view the detailed records.
cat ./check_report//obdiag_check_report_observer_2024-08-14-15-43-59.table
The output is as follows:
+--------------------------------------------------------------------------------------------------------------------+
| critical-tasks-report |
+----------------------------+---------------------------------------------------------------------------------------+
| task | task_report |
+----------------------------+---------------------------------------------------------------------------------------+
| cluster.data_path_settings | [critical] [local] ip:xx.xxx.xxx.xx ,data_dir and log_dir_disk are on the same disk. |
+----------------------------+---------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| warning-tasks-report |
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
| task | task_report |
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
| system.dependent_software_swapon | [warning] [local] Do not warning. swapon is exist. We will check the swap |
| system.parameter | [warning] [local] fs.pipe-user-pages-soft : 16384. recommended: 0. |
| | [warning] [local] net.ipv4.tcp_syncookies: 0. recommended: 1. |
| system.ulimit_parameter | [warning] [local] On ip : xx.xxx.xxx.xx, ulimit -s is 10240 . recommended: unlimited. |
| | [warning] [local] On ip : xx.xxx.xxx.xx, ulimit -n is 655360 . recommended: unlimited. |
| system.getenforce | [warning] [local] Do not warning. getenforce is exist. We will check SELinux by getenforce |
| disk.disk_hole | [warning] [cluster:obagent] not warning ,the DATA_SIZE is not 0 . need check sum(REQUIRED_SIZE)/sum(DATA_SIZE) |
| cluster.ob_enable_plan_cache_bad_version | [warning] Unadapted by version. SKIP |
| cluster.optimizer_better_inlist_costing_parmmeter | [warning] Unadapted by version. SKIP |
| cluster.part_trans_action_max | [warning] Unadapted by version. SKIP |
| cluster.table_history_too_many | [warning] Unadapted by version. SKIP |
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| all-tasks-report |
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
| task | task_report |
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
| system.aio | all pass |
| system.dependent_software | all pass |
| system.dependent_software_swapon | [warning] [local] Do not warning. swapon is exist. We will check the swap |
| system.parameter | [warning] [local] fs.pipe-user-pages-soft : 16384. recommended: 0. |
| | [warning] [local] net.ipv4.tcp_syncookies: 0. recommended: 1. |
| system.parameter_ip_local_port_range | all pass |
| system.parameter_tcp_rmem | all pass |
··· # Omitted output
| bugs.bug_182 | all pass |
| clog.clog_disk_full | all pass |
| table.information_schema_tables_two_data | all pass |
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
Root cause analysis
When an OceanBase Database instance encounters an exception, you can run the obd obdiag rca run command to analyze the root cause of the exception. You can run the obd obdiag rca list command to view the exceptions that can be analyzed.
obd obdiag rca list obtest
The following table lists the exceptions that can be analyzed in obdiag V2.3.0.
The tool oceanbase-diagnostic-tool is already installed the latest version 2.3.0
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
command info_en info_cn
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
obdiag rca run --scene=ddl_disk_full Disk space insufficient during DDL process. The disk space is insufficient during the DDL process.
obdiag rca run --scene=disconnection Root cause analysis of disconnection Root cause analysis of disconnection
obdiag rca run --scene=lock_conflict root cause analysis of lock conflict Root cause analysis for lock conflicts
obdiag rca run --scene=major_hold root cause analysis of major hold Root cause analysis for major hold scenarios
obdiag rca run --scene=clog_disk_full Identify the issue of clog disk space being full. clog disk is full.
obdiag rca run --scene=ddl_failure diagnose ddl failure Diagnose ddl failure
obdiag rca run --scene=index_ddl_error Troubleshoot errors in index execution. index DDL error investigation
obdiag rca run --scene=log_error Troubleshoot log-related issues. Currently supported scenarios: no_leader. Troubleshoot log-related issues. Currently supported scenarios: no_leader.
obdiag rca run --scene=transaction_disconnection root cause analysis of transaction disconnection Root cause analysis of transaction disconnection
The error code is similar to -4012. You need to provide the error message.
obdiag rca run --scene=transaction_not_ending transaction wait timeout error (beta), error_code like -4012 transaction not ending scene (beta), error_code like -4012
obdiag rca run --scene=transaction_other_error transaction other error, error_code like -4030, -4121, -4122, -4124, -4019 transaction other error, except the already listed errors, for example, the error code is -4030, -4121, -4122, -4124, -4019
obdiag rca run --scene=transaction_rollback A transaction rollback error. error_code like -6002.
obdiag rca run --scene=transaction_wait_timeout transaction wait timeout error, error_msg like 'Shared lock conflict' or 'Lock wait timeout exceeded' transaction wait timeout error
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Trace ID: b7b86acc-5a12-11ef-9009-00163e0808cc
If you want to view detailed obdiag logs, please run: /home/admin/oceanbase-diagnostic-tool/obdiag display-trace b7b86acc-5a12-11ef-9009-00163e0808cc
Trace ID: b61482b4-5a12-11ef-b3ee-00163e0808cc
If you want to view detailed obd logs, please run: obd display-trace b61482b4-5a12-11ef-b3ee-00163e0808cc
For example, in a disconnection scenario, run the following command:
obd obdiag rca run obtest --scene=disconnection
The output is as follows. You can copy and run the cat command or the obdiag display-trace command to view the detailed records.
+-----------------------------------------------------------------------------------------------------------+
| record |
+------+----------------------------------------------------------------------------------------------------+
| step | info |
+------+----------------------------------------------------------------------------------------------------+
| 1 | node:xxx.xxx.xxx obproxy_diagnosis_log:[2024-08-13 17:48:37.667014] [23173][Y0-00007FAA5183E710] |
| | [CONNECTION](trace_type="CLIENT_VC_TRACE", connection_diagnosis={cs_id:1065, ss_id:4559, |
| | proxy_session_id:837192278409543969, server_session_id:3221810838, |
| | client_addr:"xxx.xxx.xxx.xxx:xxxx", server_addr:"xxx.xxx.xxx.xxx:2883", cluster_name:"obcluster", |
| | tenant_name:"sys", user_name:"root", error_code:-10010, error_msg:"An unexpected connection event |
| | received from client while obproxy reading request", request_cmd:"COM_SLEEP", sql_cmd:"COM_END", |
| | req_total_time(us):5315316}{vc_event:"VC_EVENT_EOS", user_sql:""}) |
| 2 | cs_id:1065, server_session_id:3221810838 |
| 3 | trace_type:CLIENT_VC_TRACE |
| 4 | error_code:-10010 |
+------+----------------------------------------------------------------------------------------------------+
The suggest: Need client cooperation for diagnosis
rca finished. For more details, the result on '/home/admin/rca/obdiag_disconnection_20240814160846'
You can get the suggest by 'cat /home/admin/rca/obdiag_disconnection_20240814160846/record'
Trace ID: 6ded77fa-5a14-11ef-9b92-00163e0808cc
If you want to view detailed obdiag logs, please run: /home/admin/oceanbase-diagnostic-tool/obdiag display-trace 6ded77fa-5a14-11ef-9b92-00163e0808cc
Trace ID: 6c06e2e6-5a14-11ef-825f-00163e0808cc
If you want to view detailed obd logs, please run: obd display-trace 6c06e2e6-5a14-11ef-825f-00163e0808cc
