If OceanBase Diagnostic Tool (obdiag) is independently deployed, you can use obdiag gather commands to collect diagnostic information of OceanBase Database. You can use gather scenes commands to collect the information required for troubleshooting specific issues. These commands help address the pain point in collecting information on distributed nodes.
Prerequisites
Before you use the commands, make sure that you have configured the logon information of the target nodes in the config.yml configuration file of obdiag. For more information, see Configure obdiag.
View supported scenarios
obdiag gather scene list
obdiag gather scene list
[Other Problem Gather Scenes]:
---------------------------------------------------------------------------------------
command info_en info_cn
---------------------------------------------------------------------------------------
obdiag gather scene run --scene=other.application_error [application error] [应用报错问题]
---------------------------------------------------------------------------------------
[Obproxy Problem Gather Scenes]:
----------------------------------------------------------------------------------
command info_en info_cn
----------------------------------------------------------------------------------
obdiag gather scene run --scene=obproxy.restart [obproxy restart] [obproxy无故重启]
----------------------------------------------------------------------------------
[Observer Problem Gather Scenes]:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
command info_en info_cn
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
obdiag gather scene run --scene=observer.backup [backup problem] [数据备份问题]
obdiag gather scene run --scene=observer.backup_clean [backup clean] [备份清理问题]
obdiag gather scene run --scene=observer.base [cluster base info] [集群基础信息]
obdiag gather scene run --scene=observer.clog_disk_full [clog disk full] [clog盘满]
obdiag gather scene run --scene=observer.cluster_down [cluster down] [集群无法连接]
obdiag gather scene run --scene=observer.compaction [compaction] [合并问题]
obdiag gather scene run --scene=observer.cpu_high [High CPU] [CPU高]
obdiag gather scene run --scene=observer.delay_of_primary_and_backup [delay of primary and backup] [主备库延迟]
obdiag gather scene run --scene=observer.io [io problem] [io问题]
obdiag gather scene run --scene=observer.log_archive [log archive] [日志归档问题]
obdiag gather scene run --scene=observer.long_transaction [long transaction] [长事务]
obdiag gather scene run --scene=observer.memory [memory problem] [内存问题]
obdiag gather scene run --scene=observer.perf_sql --env "{db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest', trace_id='Yxx'}" [SQL performance problem] [SQL性能问题]
obdiag gather scene run --scene=observer.px_collect_log --env "{trace_id='Yxx', estimated_time='2024-04-19 14:46:17'}" [Collect error source node logs for SQL PX] [SQL PX 收集报错源节点日志]
obdiag gather scene run --scene=observer.recovery [recovery] [数据恢复问题]
obdiag gather scene run --scene=observer.restart [restart] [observer无故重启]
obdiag gather scene run --scene=observer.rootservice_switch [rootservice switch] [有主改选或者无主选举的切主]
obdiag gather scene run --scene=observer.sql_err --env "{db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest', trace_id='Yxx'}" [SQL execution error] [SQL 执行出错]
obdiag gather scene run --scene=observer.suspend_transaction [suspend transaction] [悬挂事务]
obdiag gather scene run --scene=observer.unit_data_imbalance [unit data imbalance] [unit迁移/缩小 副本不均衡问题]
obdiag gather scene run --scene=observer.unknown [unknown problem] [未能明确问题的场景]
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Use the obdiag gather scene run command
obdiag gather scene run --scene={SceneName}
Usage description
--scene={SceneName}
`SceneName` specifies the scenario for which information is to be collected.
Example1:
obdiag gather scene run --scene=observer.unknown
The following table describes the options.
| Option | Required? | Data type | Default value | Description |
|---|---|---|---|---|
| --scene | Yes | string | Empty | The name of the scenario. You can run the obd obdiag gather scene list command to view the scenarios supported by the current version. |
| --from | No | string | Empty | The start time of log collection in the yyyy-mm-dd hh:mm:ss format. |
| --to | No | string | Empty | The end time of log collection in the yyyy-mm-dd hh:mm:ss format. |
| --since | No | string | Empty | The most recent period of time for log collection, in the format of <n> <m|h|d>, where n specifies the time value, m represents "minute", h represents "hour", and d represents "day". For example, 30m specifies to collect logs of the last 30 minutes. |
| --env | No | string | Empty | Additional parameters to be collected for specific scenarios. |
| --store_dir | No | string | The current path where the command is executed | The local path where the results are stored. |
| -c | No | string | ~/.obdiag/config.yml |
The path of the configuration file. |
Note
If you do not specify the store_dir option, the collected information will be stored in the gather_pack_xxxx folder in the current directory.
# Application error
obdiag gather scene run --scene=other.application_error
# Unexpected restart of the obproxy process
obdiag gather scene run --scene=obproxy.restart
# Data backup exception
obdiag gather scene run --scene=observer.backup
# Backup cleanup exception
obdiag gather scene run --scene=observer.backup_clean
# Clog disk space exhausted
obdiag gather scene run --scene=observer.clog_disk_full
# Major compaction exception
obdiag gather scene run --scene=observer.compaction
# High CPU usage
obdiag gather scene run --scene=observer.cpu_high
# Data latency between the primary and standby databases
obdiag gather scene run --scene=observer.delay_of_primary_and_backup
# Log archiving exception
obdiag gather scene run --scene=observer.log_archive
# Long-running transaction
obdiag gather scene run --scene=observer.long_transaction
# Memory exception
obdiag gather scene run --scene=observer.memory
# SQL performance exception. In the following example, the value of `trace_id` in the `env` parameter corresponds to the value of `trace_id` in the `gv$ob_sql_audit` view.
obdiag gather scene run --scene=observer.perf_sql --env "{db_connect='-hxx -Pxx -uxx -pxx -Dxx', trace_id='xx'}"
# Restore exception
obdiag gather scene run --scene=observer.memory#
# Unexpected restart of the observer process
obdiag gather scene run --scene=observer.restart
# Leader switching in re-election with a leader or election without a leader
obdiag gather scene run --scene=observer.rootservice_switch
# SQL execution error. In the following example, the value of `trace_id` in the `env` parameter corresponds to the value of `trace_id` in the `gv$ob_sql_audit` view.
obdiag gather scene run --scene=observer.sql_err --env "{db_connect='-hxx -Pxx -uxx -pxx -Dxx', trace_id='xx'}"
# Suspended transaction
obdiag gather scene run --scene=observer.suspend_transaction
# Data imbalance between units after unit migration or unit reduction
obdiag gather scene run --scene=observer.unit_data_imbalance
# Unknown error
obdiag gather scene run --scene=observer.unknown
# When you collect PX logs of the source node that reports an error, the `trace_id` parameter is required and the `estimated_time` parameter is optional. The default value of `estimated_time` is the current time. PX logs generated one week earlier than the specified time will be collected based on the specified trace ID.
obdiag gather scene run --scene=observer.px_collect_log --env "{trace_id='Yxx', estimated_time='2024-04-19 14:46:17'}"
# Collect basic information of a cluster
obdiag gather scene run --scene=observer.base
To collect data of a specified period of time, you can specify the --from and --to, or --since parameters. For example, you can specify the --from and --to parameters to collect data of the period from "2024-01-30 12:30:00" to "2024-01-30 12:40:00".
obdiag gather scene run --scene=other.application_error --from "2024-01-30 12:30:00" --to "2024-01-30 12:40:00"
You can also specify the --since parameter to collect data of the last 10 minutes.
obdiag gather scene run --scene=other.application_error --since 10m
Tutorial for writing a task
A task is a specific scenario. obdiag runs a task based on the dedicated script file in the YAML format.
Preparations
Before you write the script file of a task, you must specify the path of the file.
The file must be stored in the directory specified by the gather.scenes_base_path parameter in the inner_config.yml configuration file in the /usr/local/oceanbase-diagnostic-tool/conf/ directory. Check whether the task falls into an existing category in the directory. If not, create a folder to declare the category.
Here is an example:
# Go to `${gather.scenes_base_path}` and create a sample file `test.yaml` for testing the observer process.
cd ~/.obdiag/gather/tasks/observer
touch test.yaml
Now you are prepared.
Write a task
To write a task is to edit the test.yaml file.
# Declare the purpose of the task.
info: "for test"
Pay attention to the details when you write the task.
Write the task
The task script is a list that declares the steps to be executed in scenario-based collection.
Why is a task a list?
- This is to ensure compatibility with different versions.
An element of a task involves the following parameters.
| Parameter | Required? | Description |
|---|---|---|
| version | No | The OceanBase Database versions that the script is compatible with. An example is provided below the table. The value is a left-open right-closed range with complete version numbers in the form of a string. A version number contains three digits for OceanBase Database V3.x, such as [3.1.1,3.2.0], or four digits for OceanBase Database V4.x. |
| steps | Yes | The list of steps to be executed. |
Here is an example:
info: testinfo
task:
- version: "[3.1.0,3.2.4]"
steps:
{steps_object}
- version: [4.2.0.0,4.3.0.0]
steps:
{steps_object}
steps is a list of multiple execution processes.
An element of steps is a single step that involves the following parameters.
| Parameter | Required? | Description |
|---|---|---|
| type | Yes | The type of the execution step. Valid values are ssh, sql, log, obproxy_log, and sysstat. More types will be supported in later versions. |
| {ssh/sql/log/obproxy_log/sysstat} | Yes | The parameters for the selected type, which depend on the logic description of the execution type in the code. Supported execution steps are described in the following sections. |
In the following examples, step: serves only as a mark and has no actual meaning.
ssh
Remotely execute the instruction and obtain the corresponding return value.
step:
type: ssh
ssh: wc -l /proc/${task_OBServer_pid}/maps | awk '{print $1}'
global: false # The `global` field specifies whether to execute a step only on a single node or on all nodes. The value `true` specifies to execute the step only on the first node. The value `false` specifies to execute the step on each node.
sql
Execute an SQL statement and obtain the corresponding value.
step:
type: sql
sql: select tenant_name from oceanbase. __all_tenant;
global: false
log
Collect logs of the observer process.
step:
type: log
grep: "" # Fields for filtering logs.
global: false
obproxy_log
Collect logs of OceanBase Database Proxy (ODP).
step:
type: obproxy_log
grep: "" # Fields for filtering logs.
global: false
sysstat
Collect host information.
step:
type: sysstat
sysstat: ""
global: false
Note
The global field specifies whether to execute a step only on a single node or on all nodes. The value true specifies to execute the step only on the first node. The value false specifies to execute the step on each node.