OceanBase database is a native distributed database system. Failure root cause analysis usually involves many factors such as machine environment, configuration parameters, and operating load. When troubleshooting problems, experts need to obtain a large amount of information scattered across various nodes. OceanBase Agile Diagnostic Tool (OceanBase Diagnostic Tool, referred to as obdiag) is dedicated to efficiently collecting and analyzing diagnostic information in fault scenarios. The obdiag code is completely open source, please see GitHub Code Repository for details.
What is obdiag
obdiag is a black screen diagnostic tool specially built for OceanBase. It supports scanning, collection and analysis of cluster logs, SQL audits, process stacks and other information. It can be executed with one click in different deployment modes such as OCP, OBD or user manual deployment to complete display, collection, analysis, inspection, and root cause analysis of diagnostic information.
obdiag features
- Simple deployment: Provides RPM/DEB packages and obd integrated deployment, which can be deployed on any machine that can connect to all nodes in the cluster, not limited to OBServer nodes.
- Centralized collection: Single-point deployment, just execute collection, inspection or analysis commands on the deployment machine.
- Easy to use: One command to complete the installation; one-click cluster inspection, one-click information collection, one-click diagnostic analysis, one-click root cause analysis, and one-click cluster insight can all be completed through commands.
- Completely open source: Developed based on Python, the source code is 100% open source, see GitHub code repository for details.
- Highly scalable: One-click inspection, scenario-based collection, root cause analysis, and cluster insights are all plug-in designs, which can be expanded to customized scenarios at low cost.
Obdiag Core Function Overview
Function module |
Description |
|---|---|
| obdiag config | Quickly generate/edit cluster configuration files (default ~/.obdiag/config.yml). |
| obdiag gather | One-click collection of diagnostic information: logs, parameters/variables, host information, stack, perf, clog/slog, AWR, ASH, OBProxy/OMS logs, scenario collection, table export (tabledump), DBMS_XPLAN, core, etc. |
| obdiag display | One-click cluster insight: Display cluster information, merger status (compaction), etc. according to scenarios, and the black screen output does not drop to the disk. |
| obdiag check | One-click cluster inspection: perform inspection tasks on OceanBase database/OBProxy, support YAML and Python scripts, and output table/JSON/XML/YAML/HTML reports. |
| obdiag analyze | One-click diagnostic analysis: log analysis, FLT trace full link, parameter/variable comparison, queue backlog, index space, memory analysis, etc. |
| obdiag rca | One-click root cause analysis: perform root cause analysis according to scenarios (disconnection, lock conflict, disk full, transaction timeout, OMS/OBCDC, etc.). |
| obdiag update | Hot update inspection and scene collection task files (supports online/local files). |
| obdiag tool | Toolbox: configuration file encryption (crypto_config), AI diagnostic assistant (ai_assistant), disk IO performance detection (io_performance), configuration verification (config_check). |
| obdiag display-trace | View the detailed log of the current execution based on the Trace ID. |
obdiag current version support capabilities
Information collection (gather)
- One-click collection of OceanBase cluster logs (observer/election/rootservice), OBProxy logs, and OMS logs; supports time range, keyword filtering, file number limit, and log desensitization.
- Collect cluster parameters/variables, host information (sysstat), stack (stack), perf (sample/flame/pstack), clog/slog, AWR (depends on enterprise version OCP), ASH, parallel SQL execution details (plan_monitor), table export (tabledump), DBMS_XPLAN, core diagnostic information with one click.
- Scenario-based collection: supports observer/obproxy/other multiple scenarios (such as high CPU, SQL exception, cluster downtime, merge, backup and restore, OBProxy restart, etc.), supports
obdiag gather scene listandobdiag gather scene run --scene=xxx.
Cluster Insights (display)
- Display cluster information by scene with one click (cluster/node/Zone/RS/tenant/event/lock holding/TopSQL/slow SQL/table information/process list/execution plan/data volume/merger status, etc.), supporting
obdiag display scene listandobdiag display scene run --scene=xxx. - Starting from V4.0.0, the merge status display scene is supported:
obdiag display scene run --scene=observer.compaction. - Starting from V4.1.0, clog log volume/capacity statistics scenario is supported:
obdiag display scene run --scene=observer.clog_volume_statistics.
- Display cluster information by scene with one click (cluster/node/Zone/RS/tenant/event/lock holding/TopSQL/slow SQL/table information/process list/execution plan/data volume/merger status, etc.), supporting
Cluster inspection (check) Perform inspections on the OceanBase cluster with one click, support YAML/Python tasks of OBServer nodes and OBProxy, and output table/json/xml/yaml/html reports; support
obdiag check listandobdiag check run.Diagnostic Analysis (analyze) One-click analysis of OceanBase cluster logs (online/offline), FLT trace full link, parameter/variable diff and default values, queue backlog, index (including pre-created) space, memory (online/offline).
Root Cause Analysis (rca) One-click root cause analysis by scenario (disconnection, major stuck, lock conflict, DDL/clog disk full, log error, transaction disconnection/timeout/not ended/rollback, OMS full/OBCDC, pending transaction, unit GC, playback stuck, memory full, server deletion exception, GC/weak read/split scheduling, etc.), supports
obdiag rca listandobdiag rca run --scene=xxx.Toolbox (tool)
- crypto_config: Encrypt/decrypt config files.
- ai_assistant (BETA): AI intelligent diagnostic assistant.
- io_performance: Disk IO performance detection (based on tsar, supports automatic identification of specified disks or clog/data).
- config_check: Verify whether the
--configparameter is legal.
Other Supports cluster diagnosis deployed by Docker and Kubernetes (ob-operator); supports IPv6; gather stack supports ARM; inspection and scenario-based collection support hot update; supports out-of-box use without configuration file through
--config; supports encrypted configuration and--config_password.
