This article is applicable to the scenario of independent deployment of obdiag. Using the obdiag rca command can help analyze the diagnostic information related to the OceanBase database. Currently, it supports the analysis of abnormal scenarios of OceanBase to find out the possible causes of the problem.
If it is a cluster deployed by obd, you can directly collect information on the selected cluster through the obd side diagnostic command group.
Note: Since root cause analysis requires a large amount of internal table data for analysis, the tenant_sys user in the obdiag configuration file needs to have read permissions for the tables and views under the oceanbase library.
rca command group overview
# List all supported RCA scenarios
obdiag rca list
# Run RCA for a scenario
obdiag rca run --scene=<scene_name>
````scene_name` contains the following:
* disconnection: disconnection diagnosis, based on OBProxy diagnostic log.
* major_hold: diagnose stuck major compaction.
* lock_conflict: lock conflict diagnosis.
* ddl_disk_full: Insufficient disk space during DDL process.
* clog_disk_full: root cause analysis of clog disk full scenario.
* log_error: root cause analysis of log stream ownerless scenario.
* ddl_failure: DDL failure diagnosis.
* index_ddl_error: Root cause analysis of index building errors.
* transaction_disconnection: root cause analysis of transaction disconnection scenarios.
* transaction_execute_timeout: Transaction execution timeout.
* transaction_not_ending: transaction does not end scenario.
* transaction_other_error: other transaction errors (such as error codes -4030, -4121, -4122, -4124, -4019, etc.).
* transaction_rollback: transaction rollback error.
* transaction_wait_timeout: transaction wait timeout.
* oms_full_trans: Root cause analysis of OMS full migration abnormal scenarios.
* oms_obcdc: OMS obcdc component analysis scenario root cause analysis.
* suspend_transaction: root cause analysis of suspended transactions.
* unit_gc: unit GC troubleshooting.
* replay_hold: Replay card scene root cause analysis.
* memory_full: Root cause analysis of memory explosion scenarios.
* delete_server_error: Root cause analysis of abnormal scenarios when deleting OBServer nodes.
* gc_troubleshooting: GC troubleshooting.
* schema_leak: Schema leak root cause analysis.
* split_schedule_error: root cause analysis of split scheduling exceptions.
* weak_read_troubleshooting: Troubleshooting weak read problems.
* execute_memory_high: SQL execution memory high diagnosis.
## obdiag rca list
Use this command to obtain the scenarios currently supported by root cause analysis.
## obdiag rca run
Use this command to perform root cause analysis of the specified scenario.
```shell
obdiag rca run --scene=<scene_name>
The options are explained below:
Option name |
Is it required |
Data type |
Default value |
Description |
|---|---|---|---|---|
| --scene | Yes | string | Default is empty | Root cause analysis scene name, you can view the supported scene list through obdiag rca list. |
| --store_dir | No | string | ./obdiag_rca/ |
Path to save root cause analysis results. |
| --report_type | No | string | table | Report output format, configurable values are table, json, xml, yaml, html. |
| --env | No | string | Default is empty | Scene environment variable, format: --env key=value or JSON, can be specified multiple times. Please refer to each scene document for details. |
| -c | No | string | ~/.obdiag/config.yml |
Configuration file path. |
| --inner_config | No | string | Default is empty | obdiag Self-configuration, format: --inner_config key=value. |
| --config | No | string | Default is empty | Cluster configuration, format: --config key1=value1 --config key2=value2. See obdiag configuration for details. |
| --config_password | No | string | Default is empty | obdiag When using an encrypted configuration file, you need to pass in the corresponding password through this option. For details, see Configuration File Encryption. |
Taking the scenario of analyzing disconnection as an example, the command is as follows:
obdiag rca run --scene=disconnection
The output is as follows:
+-----------------------------------------------------------------------------------------------------------+
| record |
+------+----------------------------------------------------------------------------------------------------+
| step | info |
+------+----------------------------------------------------------------------------------------------------+
| 1 | node:xxx.xxx.xxx obproxy_diagnosis_log:[2024-01-18 17:48:37.667014] [23173][Y0-00007FAA5183E710] |
| | [CONNECTION](trace_type="CLIENT_VC_TRACE", connection_diagnosis={cs_id:1065, ss_id:4559, |
| | proxy_session_id:837192278409543969, server_session_id:3221810838, |
| | client_addr:"xxx.xxx.xxx.xxx:xxxx", server_addr:"xxx.xxx.xxx.xxx:2883", cluster_name:"obcluster", |
| | tenant_name:"sys", user_name:"root", error_code:-10010, error_msg:"An unexpected connection event |
| | received from client while obproxy reading request", request_cmd:"COM_SLEEP", sql_cmd:"COM_END", |
| | req_total_time(us):5315316}{vc_event:"VC_EVENT_EOS", user_sql:""}) |
| 2 | cs_id:1065, server_session_id:3221810838 |
| 3 | trace_type:CLIENT_VC_TRACE |
| 4 | error_code:-10010 |
+------+----------------------------------------------------------------------------------------------------+
The suggest: Need client cooperation for diagnosis
