Blog编组 28
Diagnosis and Tuning with OceanBase: Utilizing obdiag for Comprehensive Log Collection and Analysis

Diagnosis and Tuning with OceanBase: Utilizing obdiag for Comprehensive Log Collection and Analysis

右侧logo


obdiag is a CLI diagnostic tool designed for OceanBase Database.  It performs comprehensive scans and collects crucial data,  such as logs, SQL audit records, and process stack information of OceanBase. You may deploy your OceanBase cluster by using OceanBase Control Platform (OCP) or OceanBase Deployer (OBD), or manually deploy it based on the OceanBase documentation. Regardless of the deployment mode, you can use obdiag to gather diagnostic information with a few clicks. This powerful tool has now been officially open-sourced, further enhancing its accessibility and usability for developers and database administrators.

The obdiag team has compiled relevant experiences regarding OceanBase diagnosis and tuning and has commenced releasing a series of tutorial articles. In this article, we will learn how to install and configure obdiag, and use obdiag to gather and analyze logs of OceanBase clusters.


OceanBase Database, being a native distributed database, so root cause analysis for faults is often complex, as it may involve many factors such as server environment, configuration parameters, and operational loads. When troubleshooting issues, experts need to gather extensive information for fault analysis. This is where OceanBase Diagnostic Tool (obdiag) comes into play. It aims to efficiently gather and analyze information dispersed across various nodes.

obdiag is a CLI diagnostic tool designed for OceanBase, with the following features:

  • Extreme lightweight: You can deploy obdiag by using the RPM package (< 30MB) or OBD with a few clicks. You can deploy it on an OBServer node or any server connecting to the OceanBase cluster nodes.
  • Easy to use: Installation can be done with one command, and functions such as cluster inspection, information gathering, diagnostic analysis, and root cause analysis can be completed with one click, making it simple and easy to use.
  • Fully open source: obdiag is developed with Python, and is 100% open source.
  • Highly scalable: The one-click inspection, one-click scenario-based information gathering, and one-click root cause analysis functions of obdiag are all plugin-based, allowing users to achieve customized scenario-based diagnosis.


Get Started with obdiag

Installation and deployment

Online deployment (when Internet access is available):

sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/oceanbase/OceanBase.repo
sudo yum install -y oceanbase-diagnostic-tool
source /usr/local/oceanbase-diagnostic-tool/init.sh


Offline deployment (when Internet access is unavailable):

Download the obdiag package: https://www.oceanbase.com/softwarecenter

sudo yum install -y oceanbase-diagnostic-tool*.rpm
source /usr/local/oceanbase-diagnostic-tool/init.sh


Configuration

You can create or edit a user-defined configuration file by running the obdiag config <option>command. By default, the configuration file is named config.yml and is stored in the ~/.obdiag/ directory. Template configuration files are stored in the ~/.obdiag/example directory.

obdiag config -h <db_host> -u <sys_user> [-p password] [-P port]

The following table describes the parameters:

ParameterRequired?Description
db_hostYesThe IP address used to connect to the sys tenant of the OceanBase cluster.
sys_userYesThe username used to connect to the sys tenant of the OceanBase cluster. To avoid permission issues, we recommend that you use 'root@sys'.
-p passwordNoThe password used to connect to the sys tenant of the OceanBase cluster. This parameter is left empty by default.
-P portNoThe port of the sys tenant of the OceanBase cluster. Port 2881 is used by default.

Here are some examples:

# A password is specified.
obdiag config -hxx.xx.xx.xx -uroot@sys -p***** -P2881

# No password is specified.
obdiag config -hxx.xx.xx.xx -uroot@sys -p"" -P2881

# Through obproxy.
obdiag config -hxx.xx.xx.xx -uroot@sys#obtest  -p***** -P2883


One-Click Log Gathering

You can run this command to gather logs of the specified OceanBase cluster:

obdiag gather log [options]

The options are explained as follows:

OptionRequired?Data typeDefault valueDescription
--fromNoStringThis option is left empty by default.The start time of log collection in the yyyy-mm-dd hh:mm:ssformat. For example, 1970-01-01 12:00:00.
--toNoStringThis option is left empty by default.The end time of log collection in the yyyy-mm-dd hh:mm:ssformat. For example, 1970-01-01 13:00:00.
--sinceNoStringThis option is left empty by default.The most recent period for which logs are collected, in the \<n> <m\|h\|d>format, where n is a number, m indicates minutes, h indicates hours, and d indicates days. For example, 30mspecifies to collect logs of the last 30 minutes.
--scopeNoStringallThe type of logs to be collected. Valid values: observer, election, rootservice, and all.
--grepNoStringThis option is left empty by default.The search keyword.
--encryptNoStringfalseSpecifies whether to encrypt the returned files. Valid values: true and false.
--store_dirNoStringThe default path is the current path in which the command is executed.The local path where the results are stored.
-cNoString~/.obdiag/config.ymlThe path of the configuration file.

Here is an example:

obdiag gather log --scope observer --from "2022-06-30 16:25:00" --to "2022-06-30 18:30:00" --grep STORAGE --encrypt true
...
ZipFileInfo:
+-------------------+-----------+
| Node              | LogSize   |
+===================+===========+
| xxx.xxx.xxx.xxx   | 36.184M   |
+-------------------+-----------+

...ZipFileInfo:
+-------------------+-----------+
| Node              | LogSize   |
+===================+===========+
| xxx.xxx.xxx.xxx   | 44.176M   |
+-------------------+-----------+
...

Summary:
+-------------------+-----------+----------+------------------+--------+------------------------------------------------------------------------+
| Node              | Status    | Size     | Password         | Time   | PackPath                                                               |
+===================+===========+==========+==================+========+========================================================================+
| xxx.xxx.xxx.xxx   | Completed | 36.762M  | HYmVourcUyRNP8Om | 19 s   | gather_pack_20220701183246/result_xxx.xxx.xxx.xxx_20220701183247.zip   |
+-------------------+-----------+----------+------------------+--------+------------------------------------------------------------------------+
| xxx.xxx.xxx.xxx   | Completed | 638.200M | 1RicMaiLUUNfemnj | 718 s  | gather_pack_20220701183246/result_xxx.xxx.xxx.xxx_20220701183918.zip   |
+-------------------+-----------+----------+------------------+--------+------------------------------------------------------------------------+

Here are examples of gathering logs of a recent period:

# Collect logs of the last hour.
obdiag gather log --since 1h


# Collect logs of the last 30 minutes.
obdiag gather log --since 30m

# Collect logs of the last 30 minutes and specify the configuration file.
obdiag gather log --since 30m -c /root/config.yml

Filter and gather logs based on keywords:

# Collect logs from the last 30 minutes and filter by the keyword "TRACE_ID":
obdiag gather log --grep "TRACE_ID"

# Collect logs from the last 30 minutes and filter by multiple keywords, for example "AAAAA" and "BBBBB":
obdiag gather log --grep "AAAAA" --grep "BBBBB"

# Collect logs from a specified time range and filter by multiple keywords, for example "AAAAA" and "BBBBB":
obdiag gather log --from "2022-06-30 16:25:00" --to "2022-06-30 18:30:00" --grep "AAAAA" --grep "BBBBB"


One-Click Log Analysis

You can run the command obdiag analyze log [option]to analyze logs of an OceanBase cluster online, or specify the --files option to enable offline analysis.

Note:

  • In online analysis mode, the runtime status of an OceanBase cluster is analyzed, and the logs are distributed on all OBServer nodes of the cluster.
  • In offline analysis mode, which is enabled by specifying the --files option, obdiag analyzes the logs of OBServer nodes that are already collected to the server where obdiag is deployed.
  • Before you use this command, make sure that you have configured the login information of the target nodes in the config.yml configuration file of obdiag.
obdiag analyze log [options]

The following table describes the options:

OptionRequired?Data typeDefault valueDescription
--fromNoStringThis option is left empty by default.The start time of log collection in the yyyy-mm-dd hh:mm:ssformat. For example, 1970-01-01 12:00:00.
--toNoStringThis option is left empty by default.The end time of log collection in the yyyy-mm-dd hh:mm:ssformat. For example, 1970-01-01 13:00:00.
--sinceNoStringThis option is left empty by default.The most recent period for which logs are collected, in the \<n> <m\|h\|d>format, where n is a number, m indicates minutes, h indicates hours, and d indicates days. For example, 30mspecifies to collect logs of the last 30 minutes.
--scopeNoStringallThe type of logs of the OceanBase cluster to be analyzed. Valid values: observer, election, rootservice, and all.
--grepNoStringThis option is left empty by default.The search keyword.
--store_dirNoStringThe default path is the current path in which the command is executed.The local path where the results are stored.
--log_levelNoStringWARNThe level of logs of the OceanBase cluster to be analyzed. Valid values in ascending order: DEBUG, TRACE, INFO, WDIAG, WARN, EDIAG, and ERROR. Logs of the specified level and higher are analyzed.
--filesNoStringThis option is left empty by default.If you specify the --filesoption, the offline log analysis mode is enabled. In offline analysis mode, you must specify the name or path of the log file of the OceanBase cluster, but do not need to specify the --from, --to, --since, or --ob_install_diroptions.
-cNoString~/.obdiag/config.ymlThe path of the configuration file.

Example of online log analysis:

obdiag analyze log --from "2023-10-08 10:25:00" --to "2023-10-08 11:30:00"

...
FileListInfo:
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node           | LogList                                                                                                                                                                                                               |
+================+=======================================================================================================================================================================================================================+
| xx.xx.xx.xx   | ['observer.log.20231008104204260', 'observer.log.20231008111305072', 'observer.log.20231008114410668', 'observer.log.wf.20231008104204260', 'observer.log.wf.20231008111305072', 'observer.log.wf.20231008114410668'] |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
...


Analyze OceanBase Online Log Summary:
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| Node           | Status    | FileName                                                                     |   ErrorCode | Message                                                                                                                       |   Count |
+================+===========+==============================================================================+=============+===============================================================================================================================+=========+
| xx.xx.xx.xx    | Completed | analyze_pack_20231008171201/xx_xx_xx_xx/observer.log.20231008104204260       |       -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use |       2 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx    | Completed | analyze_pack_20231008171201/xx_xx_xx_xx/observer.log.20231008111305072       |       -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use |       8 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx    | Completed | analyze_pack_20231008171201/xx_xx_xx_xx/observer.log.20231008114410668       |       -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use |      10 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| xx.xx.xx.xx    | Completed | analyze_pack_20231008171201/xx_xx_xx_xx/observer.log.20231008114410668       |       -4009 | IO error                                                                                                                      |      20 |
+----------------+-----------+------------------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
For more details, please run cmd 'cat analyze_pack_20231008171201/result_details.txt'

Here are examples of analyzing logs of a recent period:

# Analyze the logs of the last hour online. When you run this command, obdiag pulls the logs of the last hour from the remote host for analysis to diagnose the errors that have occurred.
obdiag analyze log --since 1h


# Analyze the logs of the last 30 minutes online. When you run this command, obdiag pulls the logs of the last 30 minutes from the remote host for analysis to diagnose the errors that have occurred.
obdiag analyze log --since 30m

Example of offline log analysis:

ls -lh test/
-rw-r--r--  1 admin  staff   256M Oct  8 17:24 observer.log.20231008104204260
-rw-r--r--  1 admin  staff   256M Oct  8 17:24 observer.log.20231008111305072
-rw-r--r--  1 admin  staff   256M Oct  8 17:24 observer.log.20231008114410668
-rw-r--r--  1 admin  staff    18K Oct  8 17:24 observer.log.wf.20231008104204260
-rw-r--r--  1 admin  staff    19K Oct  8 17:24 observer.log.wf.20231008111305072
-rw-r--r--  1 admin  staff    18K Oct  8 17:24 observer.log.wf.20231008114410668

obdiag analyze log --files test/
Analyze OceanBase Offline Log Summary:
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| Node      | Status    | FileName                                                              |   ErrorCode | Message                                                                                                                       |   Count |
+===========+===========+=======================================================================+=============+===============================================================================================================================+=========+
| 127.0.0.1 | Completed | analyze_pack_20231008172144/127_0_0_1_/observer.log.20231008104204260 |       -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use |       2 |
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| 127.0.0.1 | Completed | analyze_pack_20231008172144/127_0_0_1_/observer.log.20231008111305072 |       -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use |       8 |
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| 127.0.0.1 | Completed | analyze_pack_20231008172144/127_0_0_1_/observer.log.20231008114410668 |       -5006 | You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use |      10 |
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
| 127.0.0.1 | Completed | analyze_pack_20231008172144/127_0_0_1_/observer.log.20231008114410668 |       -4009 | IO error                                                                                                                      |      20 |
+-----------+-----------+-----------------------------------------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------+---------+
For more details, please run cmd 'cat analyze_pack_20231008172144/result_details.txt'

Analyze the specified OBServer log file offline:

# Analyze the specified log file offline.
obdiag analyze log --files observer.log.20230831142211247


How does obdiag work?

obdiag v1.2 supports one-click gathering of diagnostic information. A diagnostic tool not only needs the capability to gather information but also requires data analysis capabilities. Therefore, in obdiag v1.3.0, log analysis function has been introduced, enabling users to check for any abnormal situations with a few clicks.

The main architecture of obdiag log gathering and analysis relies on the centralized collection mode of obdiag. When a user initiates an obdiag analysis, data needs to be gathered from various nodes and then processed centrally.

oceanbase database


The process of obdiag online log analysis:

  1. Users set the configuration file, which is located in the config/config.yml of the obdiag installation directory. The main purpose is to configure the ssh login information for the OceanBase cluster being analyzed, as obdiag requires ssh access to gather cluster logs to the obdiag node for analysis.
  2. Execute the command: obdiag analyze log<option>.
  3. After receiving the user's analyze command, obdiag parses the parameters within <option>.
  4. After parsing parameters, obdiag starts to gather the logs. The nodes from which logs are gathered are configured in step 1. The time range, filtering conditions, and other settings are specified within <option> in step 3.
  5. obdiag sends remote ssh commands to the host to gather the logs.
  6. Put logs that meet the specified criteria into temporary files for subsequent transfer.
  7. Download the filtered logs from the remote host. After the download is complete, send a command to clean up the temporary files on the remote host.
  8. obdiag analyzes the log files gathered from the remote host, primarily focusing on the retcode in the logs, such as the frequency of each retcode, their earliest and latest appearances, and the corresponding trace_ids.
  9. After analyzing the logs, obdiag prints an overview of the log analysis on the console. Additionally, the detailed log analysis information will be output to a file. Users can view the detailed log analysis report with the file path provided by obdiag.

oceanbase database


Additional Resources

Download obdiag

Find the latest version of obdiag for free from the OceanBase Software Center.


obdiag Documentation

Explore comprehensive usage guides and configuration details in the obdiag Documentation.


GitHub Repository

Review the source code, report issues, and contribute to the project on GitHub.



ICON_SHARE
ICON_SHARE