Use obdiag to collect information for specific scenarios|V2.3.0| docs|Distributed Database

If OceanBase Diagnostic Tool (obdiag) is independently deployed, you can use obdiag gather commands to collect diagnostic information of OceanBase Database. You can use gather scenes commands to collect the information required for troubleshooting specific issues. These commands help address the pain point in collecting information on distributed nodes.

Prerequisites

Before you use the commands, make sure that you have configured the logon information of the target nodes in the config.yml configuration file of obdiag. For more information, see Configure obdiag.

View supported scenarios

obdiag gather scene list

obdiag gather scene list

[Other Problem Gather Scenes]:
---------------------------------------------------------------------------------------
command                                                   info_en               info_cn
---------------------------------------------------------------------------------------
obdiag gather scene run --scene=other.application_error   [application error]   [应用报错问题]
---------------------------------------------------------------------------------------

[Obproxy Problem Gather Scenes]:
----------------------------------------------------------------------------------
command                                           info_en             info_cn
----------------------------------------------------------------------------------
obdiag gather scene run --scene=obproxy.restart   [obproxy restart]   [obproxy无故重启]
----------------------------------------------------------------------------------

[Observer Problem Gather Scenes]:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
command                                                                                                                                   info_en                                       info_cn
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
obdiag gather scene run --scene=observer.backup                                                                                           [backup problem]                              [数据备份问题]
obdiag gather scene run --scene=observer.backup_clean                                                                                     [backup clean]                                [备份清理问题]
obdiag gather scene run --scene=observer.base                                                                                             [cluster base info]                           [集群基础信息]
obdiag gather scene run --scene=observer.clog_disk_full                                                                                   [clog disk full]                              [clog盘满]
obdiag gather scene run --scene=observer.cluster_down                                                                                     [cluster down]                                [集群无法连接]
obdiag gather scene run --scene=observer.compaction                                                                                       [compaction]                                  [合并问题]
obdiag gather scene run --scene=observer.cpu_high                                                                                         [High CPU]                                    [CPU高]
obdiag gather scene run --scene=observer.delay_of_primary_and_backup                                                                      [delay of primary and backup]                 [主备库延迟]
obdiag gather scene run --scene=observer.io                                                                                               [io problem]                                  [io问题]
obdiag gather scene run --scene=observer.log_archive                                                                                      [log archive]                                 [日志归档问题]
obdiag gather scene run --scene=observer.long_transaction                                                                                 [long transaction]                            [长事务]
obdiag gather scene run --scene=observer.memory                                                                                           [memory problem]                              [内存问题]
obdiag gather scene run --scene=observer.perf_sql --env "{db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest', trace_id='Yxx'}"   [SQL performance problem]                     [SQL性能问题]
obdiag gather scene run --scene=observer.px_collect_log --env "{trace_id='Yxx', estimated_time='2024-04-19 14:46:17'}"                    [Collect error source node logs for SQL PX]   [SQL PX 收集报错源节点日志]
obdiag gather scene run --scene=observer.recovery                                                                                         [recovery]                                    [数据恢复问题]
obdiag gather scene run --scene=observer.restart                                                                                          [restart]                                     [observer无故重启]
obdiag gather scene run --scene=observer.rootservice_switch                                                                               [rootservice switch]                          [有主改选或者无主选举的切主]
obdiag gather scene run --scene=observer.sql_err --env "{db_connect='-h127.0.0.1 -P2881 -utest@test -p****** -Dtest', trace_id='Yxx'}"    [SQL execution error]                         [SQL 执行出错]
obdiag gather scene run --scene=observer.suspend_transaction                                                                              [suspend transaction]                         [悬挂事务]
obdiag gather scene run --scene=observer.unit_data_imbalance                                                                              [unit data imbalance]                         [unit迁移/缩小 副本不均衡问题]
obdiag gather scene run --scene=observer.unknown                                                                                          [unknown problem]                             [未能明确问题的场景]
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Use the `obdiag gather scene run` command

obdiag gather scene run --scene={SceneName}

Usage description

--scene={SceneName}

`SceneName` specifies the scenario for which information is to be collected.

Example1:
obdiag gather scene run --scene=observer.unknown

The following table describes the options.

Option	Required?	Data type	Default value	Description
--scene	Yes	string	Empty	The name of the scenario. You can run the `obd obdiag gather scene list` command to view the scenarios supported by the current version.
--from	No	string	Empty	The start time of log collection in the `yyyy-mm-dd hh:mm:ss` format.
--to	No	string	Empty	The end time of log collection in the `yyyy-mm-dd hh:mm:ss` format.
--since	No	string	Empty	The most recent period of time for log collection, in the format of `<n> <m\|h\|d>`, where `n` specifies the time value, `m` represents "minute", `h` represents "hour", and `d` represents "day". For example, `30m` specifies to collect logs of the last 30 minutes.
--env	No	string	Empty	Additional parameters to be collected for specific scenarios.
--store_dir	No	string	The current path where the command is executed	The local path where the results are stored.
-c	No	string	`~/.obdiag/config.yml`	The path of the configuration file.

Note

If you do not specify the store_dir option, the collected information will be stored in the gather_pack_xxxx folder in the current directory.

# Application error
obdiag gather scene run --scene=other.application_error

# Unexpected restart of the obproxy process
obdiag gather scene run --scene=obproxy.restart

# Data backup exception
obdiag gather scene run --scene=observer.backup

# Backup cleanup exception
obdiag gather scene run --scene=observer.backup_clean

# Clog disk space exhausted
obdiag gather scene run --scene=observer.clog_disk_full 

# Major compaction exception
obdiag gather scene run --scene=observer.compaction 

# High CPU usage
obdiag gather scene run --scene=observer.cpu_high

# Data latency between the primary and standby databases
obdiag gather scene run --scene=observer.delay_of_primary_and_backup 

# Log archiving exception
obdiag gather scene run --scene=observer.log_archive

# Long-running transaction
obdiag gather scene run --scene=observer.long_transaction 

# Memory exception
obdiag gather scene run --scene=observer.memory

# SQL performance exception. In the following example, the value of `trace_id` in the `env` parameter corresponds to the value of `trace_id` in the `gv$ob_sql_audit` view.
obdiag gather scene run --scene=observer.perf_sql --env "{db_connect='-hxx -Pxx -uxx -pxx -Dxx', trace_id='xx'}"   

# Restore exception
obdiag gather scene run --scene=observer.memory# 

# Unexpected restart of the observer process
obdiag gather scene run --scene=observer.restart  

# Leader switching in re-election with a leader or election without a leader
obdiag gather scene run --scene=observer.rootservice_switch  

# SQL execution error. In the following example, the value of `trace_id` in the `env` parameter corresponds to the value of `trace_id` in the `gv$ob_sql_audit` view.
obdiag gather scene run --scene=observer.sql_err --env "{db_connect='-hxx -Pxx -uxx -pxx -Dxx', trace_id='xx'}"    

# Suspended transaction
obdiag gather scene run --scene=observer.suspend_transaction 

# Data imbalance between units after unit migration or unit reduction
obdiag gather scene run --scene=observer.unit_data_imbalance 

# Unknown error
obdiag gather scene run --scene=observer.unknown

# When you collect PX logs of the source node that reports an error, the `trace_id` parameter is required and the `estimated_time` parameter is optional. The default value of `estimated_time` is the current time. PX logs generated one week earlier than the specified time will be collected based on the specified trace ID.
obdiag gather scene run --scene=observer.px_collect_log --env "{trace_id='Yxx', estimated_time='2024-04-19 14:46:17'}"

# Collect basic information of a cluster
obdiag gather scene run --scene=observer.base

To collect data of a specified period of time, you can specify the --from and --to, or --since parameters. For example, you can specify the --from and --to parameters to collect data of the period from "2024-01-30 12:30:00" to "2024-01-30 12:40:00".

obdiag gather scene run --scene=other.application_error --from "2024-01-30 12:30:00" --to "2024-01-30 12:40:00"

You can also specify the --since parameter to collect data of the last 10 minutes.

obdiag gather scene run --scene=other.application_error --since 10m

Tutorial for writing a task

A task is a specific scenario. obdiag runs a task based on the dedicated script file in the YAML format.

Preparations

Before you write the script file of a task, you must specify the path of the file.

The file must be stored in the directory specified by the gather.scenes_base_path parameter in the inner_config.yml configuration file in the /usr/local/oceanbase-diagnostic-tool/conf/ directory. Check whether the task falls into an existing category in the directory. If not, create a folder to declare the category.

Here is an example:

# Go to `${gather.scenes_base_path}` and create a sample file `test.yaml` for testing the observer process.
cd ~/.obdiag/gather/tasks/observer
touch test.yaml

Now you are prepared.

Write a task

To write a task is to edit the test.yaml file.

# Declare the purpose of the task.

info: "for test"

Pay attention to the details when you write the task.

Write the task

The task script is a list that declares the steps to be executed in scenario-based collection.

Why is a task a list?

This is to ensure compatibility with different versions.

An element of a task involves the following parameters.

Parameter	Required?	Description
version	No	The OceanBase Database versions that the script is compatible with. An example is provided below the table. The value is a left-open right-closed range with complete version numbers in the form of a string. A version number contains three digits for OceanBase Database V3.x, such as [3.1.1,3.2.0], or four digits for OceanBase Database V4.x.
steps	Yes	The list of steps to be executed.

Here is an example:

info: testinfo
task:
  - version: "[3.1.0,3.2.4]"
    steps:
    	{steps_object}
  - version: [4.2.0.0,4.3.0.0]
    steps:
    	{steps_object}

steps is a list of multiple execution processes.

An element of steps is a single step that involves the following parameters.

Parameter	Required?	Description
type	Yes	The type of the execution step. Valid values are `ssh`, `sql`, `log`, `obproxy_log`, and `sysstat`. More types will be supported in later versions.
{ssh/sql/log/obproxy_log/sysstat}	Yes	The parameters for the selected type, which depend on the logic description of the execution type in the code. Supported execution steps are described in the following sections.

In the following examples, step: serves only as a mark and has no actual meaning.

ssh

Remotely execute the instruction and obtain the corresponding return value.

step:
  type: ssh
  ssh: wc -l /proc/${task_OBServer_pid}/maps | awk '{print $1}'
  global: false # The `global` field specifies whether to execute a step only on a single node or on all nodes. The value `true` specifies to execute the step only on the first node. The value `false` specifies to execute the step on each node.

sql

Execute an SQL statement and obtain the corresponding value.

step:
 type: sql
 sql: select tenant_name from oceanbase. __all_tenant;
 global: false

log

Collect logs of the observer process.

step:
 type: log
 grep: "" # Fields for filtering logs.
 global: false

obproxy_log

Collect logs of OceanBase Database Proxy (ODP).

step:
 type: obproxy_log
 grep: "" # Fields for filtering logs.
 global: false

sysstat

Collect host information.

step:
 type: sysstat
 sysstat: ""
 global: false

Note

The global field specifies whether to execute a step only on a single node or on all nodes. The value true specifies to execute the step only on the first node. The value false specifies to execute the step on each node.

Use obdiag to collect information for specific scenarios

Prerequisites

View supported scenarios

Use the obdiag gather scene run command

Usage description

Note

Tutorial for writing a task

Preparations

Write a task

Write the task

ssh

sql

log

obproxy_log

sysstat

Note

Use the `obdiag gather scene run` command