OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Best Practices

All Versions

  • Deploy
    • Configuration guide for read-write splitting in AP scenarios
    • Best practices for read-write splitting
  • Migrate
    • Data transfer solutions in OceanBase Database
    • Overview on data migration
    • Best practices for importing data files to OceanBase Database
    • Best practice for migrating data from other databases to OceanBase Database
    • Massive data migration strategy
    • Best practices for migrating data from MyCat to OceanBase Database
    • Best practices for migrating PostgreSQL to OceanBase MySQL-compatible mode
  • Route
    • ODP routing best practices
  • Table Design
    • Best practices for table design and index optimization
    • Best practices for creating indexes on large tables
    • Best practices for database development
  • Develop
    • Best practices for connecting Java applications to OceanBase Database
    • Best practices for integrating Spark Catalog with OceanBase Database
    • Best practices for achieving optimal performance in batch DML using JDBC and OBServer
    • Best practices for bulk data cleanup in OceanBase Database
    • Best practices for PDML processing in OceanBase Database
    • Best practices for hot tables in OceanBase Database
    • Best practices for auto-increment columns and sequences in OceanBase Database
  • Manage
    • Best practices for resource throttling
    • Best practices for data load balancing
    • Best practices for security certification
    • Best practices for access control
    • Best practices for data encryption
  • Diagnose
    • Best practices for log interpretation in common scenarios
    • Best practices for end-to-end tracing
    • Best practices for using obdiag to collect performance information
    • Best practices for using obdiag to collect diagnostic information of parallel and slow SQL statements
    • Best practices for troubleshooting OceanBase Database performance issues
  • Performance Tuning
    • Best practices for handling slow queries
    • Best practices for collecting statistics to generate an efficient execution plan
    • Best practices for updating hotspot rows
    • Best practices for large object storage performance
    • Best practices for semi-structured storage performance
    • Best practices for OceanBase materialized views
  • Cloud Database
    • Best practices for achieving high availability through cross-cloud active-active deployment
    • High availability through primary and standby databases across clouds
    • High host CPU usage
    • Best practices for read/write splitting in OceanBase Cloud

Download PDF

Configuration guide for read-write splitting in AP scenarios Best practices for read-write splitting Data transfer solutions in OceanBase Database Overview on data migration Best practices for importing data files to OceanBase Database Best practice for migrating data from other databases to OceanBase Database Massive data migration strategy Best practices for migrating data from MyCat to OceanBase Database Best practices for migrating PostgreSQL to OceanBase MySQL-compatible mode ODP routing best practices Best practices for table design and index optimization Best practices for creating indexes on large tables Best practices for database development Best practices for connecting Java applications to OceanBase Database Best practices for integrating Spark Catalog with OceanBase Database Best practices for achieving optimal performance in batch DML using JDBC and OBServer Best practices for bulk data cleanup in OceanBase Database Best practices for PDML processing in OceanBase Database Best practices for hot tables in OceanBase Database Best practices for auto-increment columns and sequences in OceanBase Database Best practices for resource throttling Best practices for data load balancing Best practices for security certification Best practices for access control Best practices for data encryption Best practices for log interpretation in common scenarios Best practices for end-to-end tracing Best practices for using obdiag to collect performance information Best practices for using obdiag to collect diagnostic information of parallel and slow SQL statements Best practices for troubleshooting OceanBase Database performance issues Best practices for handling slow queries Best practices for collecting statistics to generate an efficient execution plan Best practices for updating hotspot rows Best practices for large object storage performance Best practices for semi-structured storage performance Best practices for OceanBase materialized views Best practices for achieving high availability through cross-cloud active-active deployment High availability through primary and standby databases across clouds High host CPU usage Best practices for read/write splitting in OceanBase Cloud
OceanBase logo

The Unified Distributed Database for the AI Era.

Follow Us
Products
OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
Resources
DocsBlogLive DemosTraining & Certification
Company
About OceanBaseTrust CenterLegalPartnerContact Us
Follow Us

© OceanBase 2026. All rights reserved

Cloud Service AgreementPrivacy PolicySecurity
Contact Us
Document Feedback
  1. Documentation Center
  2. OceanBase Best Practices
  3. master
iconOceanBase Best Practices
master
  • master

Best practices for using obdiag to collect performance information

Last Updated:2025-01-08 08:32:43  Updated
share
What is on this page
Applicable version
Procedure
Step 1: Install and deploy obdiag
Step 2: Configure information about the specified cluster
Step 3: Collect performance information for diagnosis
Step 4: Visualize collected performance data
Working mechanism
How a flame graph is generated
How a pprof graph is generated
References

folded

share

During the operation and maintenance (O&M) of OceanBase Database, it is essential to quickly and accurately identify performance bottlenecks when your cluster experiences high memory or CPU load. OceanBase Diagnostic Tool (obdiag) is a CLI diagnostic tool specifically designed for OceanBase Database to help O&M teams efficiently analyze performance issues. This topic explains how to use obdiag to quickly collect performance data for visualized diagnostics.

Applicable version

This topic applies to all OceanBase Database versions and obdiag V2.0.0 and later.

Procedure

Step 1: Install and deploy obdiag

You can deploy obdiag in two ways: independently or through OceanBase Deployer (obd). The cluster discussed in this topic was not deployed using obd, so obdiag needs to be deployed independently. The deployment commands are as follows:

sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/oceanbase/OceanBase.repo
sudo yum install -y oceanbase-diagnostic-tool
source /usr/local/oceanbase-diagnostic-tool/init.sh

Note

  • obdiag is easy to deploy. You can deploy obdiag on an OBServer node or any server that can connect to nodes in the OceanBase cluster.
  • obdiag features centralized collection. You need to deploy obdiag only on a single server rather than all servers. Then, you can execute collection, inspection, or analysis commands on the server where obdiag is deployed.

For more information about how to install and deploy obdiag, see Install obdiag.

Step 2: Configure information about the specified cluster

obdiag config -hxx.xx.xx.xx -uroot@sys -Pxxxx -p*****

After running the obdiag config command, you will enter interactive mode, where you can provide the information based on your actual setup. For more information about how to configure obdiag, see Configure obdiag.

Step 3: Collect performance information for diagnosis

You can use the obdiag gather command to collect performance data from the specified OceanBase cluster for diagnostic purposes. Since obdiag relies on the perf tool on remote OBServer nodes to gather information, make sure to install the perf tool on the OBServer nodes in advance.

The information collection command is as follows:

obdiag gather perf

The collection process is as follows:

$obdiag gather perf
gather_perf start ...
Downloading [====================] 100.0% [866.00 B ]
Downloading [====================] 100.0% [858.00 B ]

Gather Perf Summary:
+----------------+-----------+----------+--------+----------------------------------------------------------------------------+
| Node           | Status    | Size     | Time   | PackPath                                                                   |
+================+===========+==========+========+============================================================================+
| 11.xxx.xxx.xxx | Completed | 866.000B | 2 s    | ./obdiag_gather_pack_20240807162248/perf_11.xxx.xxx.xxx_20240807162248.zip |
+----------------+-----------+----------+--------+----------------------------------------------------------------------------+
| 11.xxx.xxx.xxx | Completed | 858.000B | 1 s    | ./obdiag_gather_pack_20240807162248/perf_11.xxx.xxx.xxx_20240807162251.zip |
+----------------+-----------+----------+--------+----------------------------------------------------------------------------+
| 11.xxx.xxx.xxx | Completed | 858.000B | 2 s    | ./obdiag_gather_pack_20240807162248/perf_11.xxx.xxx.xxx_20240807162252.zip |
+----------------+-----------+----------+--------+----------------------------------------------------------------------------+
Trace ID: 3b159e3a-5496-11ef-8ce1-00163e06beb9
If you want to view detailed obdiag logs, please run: obdiag display-trace 3b159e3a-5496-11ef-8ce1-00163e06beb9

The collected diagnostic information is saved as a compressed file named in the perf_11.xxx.xxx.xxx format in ./obdiag_gather_pack_20240807162248/. After decompressing the file, the following content will be displayed:

$tree
.
├── flame.data
├── flame.viz # Data for the flame graph, which will be used later
├── sample.data
├── sample.viz # Data for the pprof graph, which will be used later
└── top.txt

Step 4: Visualize collected performance data

Visualize data for the flame graph

git clone https://github.com/brendangregg/FlameGraph.git

./FlameGraph/stackcollapse-perf.pl flame.viz | ./FlameGraph/flamegraph.pl - > perf.svg

obdiag processes the collected data in the flame.viz file twice to generate a flame graph.

Note

The generated flame graph is in the SVG format and mainly presents the call stack information and number of sampling times. You can use the scp command to download the SVG file to your local computer and view the flame graph.

flameGraph

The flame graph is described as follows:

  • The Y axis indicates the call stack, where each layer corresponds to a function. The deeper the call stack, the higher the flame. The function currently being executed is at the top, while the lower layers represent its parent functions.

  • The X axis indicates the number of sampling times. A function with a larger width in the X axis indicates more sampling times and longer execution time.

    Notice

    The X axis does not represent time. Instead, it shows a series of functions arranged alphabetically after merging all call stacks.

  • Observe the top-layer functions with large widths in the flame graph. A function with a plateau may have performance issues.

  • Colors in the flame graph have no special meanings. The flame graph indicates how busy the CPU is, which is usually represented by warm-toned colors.

  • The flame graph is in the SVG format and supports interaction with users.

    • Hover the pointer: The function name is provided in each layer of a flame. You can hover the pointer over a layer to show the complete function name, number of sampling times, and sampling ratio.
    • Zoom in: You can click a layer to horizontally zoom in on the flame graph. Then, the layer occupies the whole width, with the details displayed.

Visualize data for the pprof graph

Run the following command to process the data in the sample.viz file collected by obdiag to generate a pprof graph:

cat sample.viz | ./perfdata2graph.py svg sample.svg

In the command, perfdata2graph.py needs to be manually created and contains the following content:

#!/usr/bin/python

import sys
import os
import subprocess
import datetime

class Edge:
  def __init__(self):
    self.count = 0
    self.to = None
    self.label = None
    self.penwidth = 1
    self.weight = 1.
    self.color = "#000000"

class Node:
  def __init__(self):
    self.identify = ""
    self.name = ""
    self.count = 0
    self.self_count = 0
    self.id = None
    self.label = None
    self.color = "#F8F8F8"
    self.edges = {}

  def __str__(self):
    return "id: %s, name: %s, count %s, edges %s" % (self.id, self.name, self.count, len(self.edges))


class PerfToGraph:
  def __init__(self, fmt = "svg", node_drop_pct = 1., edge_drop_pct = None):
    self.fmt = fmt
    self.all_nodes = {}
    self.samples = 1
    self.s100 = 100.
    self.node_drop_pct = node_drop_pct
    self.edge_drop_pct = edge_drop_pct
    self.next_edge_color = 0
    if edge_drop_pct is None:
      self.edge_drop_pct = node_drop_pct / 5.
    self.node_drop_cnt = 0
    self.edge_drop_cnt = 0
    self.colors = [
        (0.02, "#FAFAF0"),
        
        (0.2, "#FAFAD2"),
        (1.0, "#F9EBB6"),
        (2.0, "#F9DB9B"),
        (3.0, "#F8CC7F"),
        (5.0, "#F7BC63"),

        (7.0, "#FF8B01"),
        (9.0, "#FA6F01"),
        (12.0, "#F55301"),
        (15.0, "#F03801"),
        (19.0, "#EB1C01"),
        (23.0, "#E60001")
        ]
    self.edge_colors = [
        "#FF8B01",
        "#EB1C01",
        "#DC92EF",
        "#9653B8",
        "#66B031",
        "#D9CA0C",
        "#BDBDBD",
        "#696969",
        "#113866",
        "#5CBFAC",
        "#1120A8",
        "#960144",
        "#EA52B2"
        ]

  def convert(self):
    self.read_stdin()
    self.formalize()
    self.output()

  def set_pen_width(self, e):
    pct = e.count * 100. / self.samples
    if pct > 10:
      e.penwidth = 3 + min(pct, 100) * 2. / 100
    elif pct > 1:
      e.penwidth = 1 + pct * 2. / 10
    else:
      e.penwidth = 1

  def set_edge_weight(self, e):
    e.weight = e.count * 100. / self.samples
    if e.weight > 100:
      e.weight = 100
    elif e.weight > 10:
      e.weight = 10 + e.weight / 10.

  def set_edge_color(self, e):
    i = self.next_edge_color
    self.next_edge_color += 1
    e.color = self.edge_colors[i % len(self.edge_colors)];

  def set_node_color(self, n):
    v = n.self_count / self.s100
    for p in self.colors:
      if v >= p[0]:
        n.color = p[1]

  def get_node(self, identify, name):
    if self.all_nodes.has_key(identify):
      return self.all_nodes[identify]
    n = Node()
    n.identify = identify
    n.name = name
    self.all_nodes[identify] = n
    return n


  def add_edge(self, f, t):
    if f.edges.has_key(t.identify):
      e = f.edges[t.identify]
      e.count += 1
    else:
      e = Edge()
      e.to = t
      e.count = 1
      f.edges[t.identify] = e

  def read_stdin(self):
    # $ escape not needed?
    cmd = "sed -e 's/<.*>//g' -e 's/ (.*$//' -e 's/+0x.*//g' -e '/^[^\t]/d' -e 's/^\s*//'"
    sub = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell = True)
    prev = None
    self.samples = 1
    for l in sub.stdout:
      l = l.strip()
      if (not l) and (not prev):
        # avoiding continuous empty lines
        continue
      tmp = l.split(' ')
      addr = tmp[0]
      name = (" ".join(tmp[1:])).strip()
      if '[unknown]' == name:
        name = addr
      if not l:
        addr = 'fake_addr'
        name = '::ALL::'
      # we use name to identify nodes
      n = self.get_node(name, name)
      if prev == n:
        continue
      n.count += 1
      if prev:
        self.add_edge(n, prev)
      prev = n

      if not l:
        self.samples += 1
        prev = None

  def formalize(self):
    self.s100 = self.samples / 100.
    self.node_drop_cnt = self.samples * self.node_drop_pct / 100
    self.edge_drop_cnt = self.samples * self.edge_drop_pct / 100

    i = 0;
    for n in self.all_nodes.values():
      n.id = "n%s" % (i)
      i+=1
      n.self_count = n.count - sum([x.count for x in n.edges.values()])
      n.label = "%s\\nTotal: %.2f%% | Call: %.2f%%\\nSelf: %.2f%%(%s)" % (n.name.replace("::", "\\n"), n.count/self.s100, (n.count - n.self_count)/self.s100, n.self_count/self.s100, n.self_count)
      self.set_node_color(n)

      for e in n.edges.values():
        e.label = "%.2f%%" % (e.count/self.s100)
        self.set_pen_width(e)
        self.set_edge_weight(e)
        self.set_edge_color(e)

  def to_dot(self):
    out = []
    out.append("""
    digraph call_graph_for_perf_data {
    style = "perf.css";
    node [shape = box, style=filled ];
    """)

    out.append('note [ label = "%s\\nTotal samples: %d\\nDrop nodes with <= %.2f%%(%d)\\nDrop edges with <= %.2f%%(%d)", fillcolor="#00AFFF" ];' % (datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'), self.samples, self.node_drop_pct, int(self.node_drop_cnt), self.edge_drop_pct, int(self.edge_drop_cnt)))

    for n in self.all_nodes.values():
      if n.count <= self.node_drop_cnt:
        continue
      out.append('%s [ label = "%s", tooltip = "%s", fillcolor="%s"];' % (n.id, n.label, n.name, n.color))

    for n in self.all_nodes.values():
      if n.count <= self.node_drop_cnt:
        continue
      for e in n.edges.values():
        if e.count <= self.edge_drop_cnt or e.to.count <= self.node_drop_cnt:
          continue
        tip = 'edgetooltip = "%s ==> %s", labeltooltip = "%s ==> %s"' % (n.name, e.to.name, n.name, e.to.name)
        out.append('%s -> %s [ penwidth = %.2f, weight = %f, color = "%s", label = "%s", fontcolor = "%s", %s ];' % (n.id, e.to.id, e.penwidth, e.weight, e.color, e.label, e.color, tip))

    out.append("}")
    return "\n".join(out)

  def output(self):
    if "dot" == self.fmt:
      print self.to_dot()
    elif "svg" == self.fmt:
      cmd = "dot -T svg"
      sub = subprocess.Popen(cmd, stdin=subprocess.PIPE, shell = True)
      dot = self.to_dot()
      sub.communicate(input = dot)
    elif "top" == self.fmt:
      try:
        for n in sorted(self.all_nodes.values(), key = lambda n : n.self_count, reverse = True):
          print "%s %.2f%%" % (n.name, n.self_count/self.s100)
      except:
        pass

if __name__ == "__main__":
  support_fmt = { "svg" : None, "dot" : None, "top" : None }
  if len(sys.argv) < 2 or (not support_fmt.has_key(sys.argv[1])):
    print "%s dot/svg/top [node_drop_perent] [edge_drop_percent]" % (sys.argv[0])
    sys.exit(1)
  fmt = sys.argv[1]
  nd_pct = len(sys.argv) > 2 and float(sys.argv[2]) or 1.0
  ed_pct = len(sys.argv) > 3 and float(sys.argv[3]) or 0.2
  c = PerfToGraph(fmt, nd_pct, ed_pct)
  c.convert()

Note

The generated pprof graph is in the SVG format. You can use the scp command to download the SVG file to your local computer and view the graph.

bianqueGraph

The pprof graph is more intuitive, in which a larger block indicates more resources occupied.

Working mechanism

How a flame graph is generated

The perf command is a native performance analysis tool provided by Linux. It captures the function name currently being executed by the CPU, along with the call stack. Typically, the sampling frequency of the command is 99 Hz, meaning it collects data 99 times per second. If the same function name is returned in all 99 samples within a second, it indicates that the CPU spent the entire second executing the same function, which could point to potential performance issues.

## Generate a flame graph
sudo perf record -F 99 -p 87741 -g -- sleep 20
sudo perf script > flame.viz

The two commands are described as follows:

  • sudo perf record -F 99 -p 87741 -g -- sleep 20

    • sudo perf record: records performance data by using the perf command.
    • -F 99: sets the sampling frequency to 99 times per second. In other words, the perf command performs sampling for the selected performance event 99 times per second.
    • -p 87741: records the performance data of only the process with the ID 87741.
    • -g: enables symbolic stack traces. This way, the location information of the source code of the function calls is provided in the report.
    • --: regards subsequent content as normal commands instead of options.
    • sleep 20: runs the sleep command to pause the process for 20 seconds and records performance data during this period.
  • sudo perf script > flame.viz

    • sudo perf script: extracts original event records from the data file named perf.data by default, which was recorded earlier.
    • > flame.viz: redirects the output to the flame.viz file.

    To sum up, this command extracts the original data records from the perf.data file recorded earlier and outputs the records to the flame.viz file. Generally, this file is used for further processing. For example, it is used to generate a flame graph to visualize the performance data.

How a pprof graph is generated

When obdiag generates a pprof graph, the following commands are executed on the nodes:

## Generate a call graph (pprof graph)
sudo perf record -e cycles -c 100000000 -p 87741 -g -- sleep 20
sudo perf script -F ip,sym -f > sample.viz

Note

-p is followed by a process ID. You need to change it to the ID of the process on which the perf command is run.

The two commands are described as follows:

  • sudo perf record -e cycles -c 100000000 -p 87741 -g -- sleep 20

    • sudo: You need to execute this command as the root user.
    • perf record: records performance data by using the perf command.
    • -e cycles: records CPU cycles as performance events.
    • -c 100000000: sets the maximum number of events to be recorded to 0.1 billion. When the number of recorded events reaches this threshold, no more events are recorded.
    • -p 87741: records the performance data of the process with the ID 87741.
    • -g: enables symbolic stack traces. This way, the location information of the source code of the function calls is provided in the report.
    • --: regards subsequent content as normal commands instead of options.
    • sleep 20: runs the sleep command to pause the process for 20 seconds and records performance data during this period.

    To sum up, this command records the first 0.1 billion CPU cycles generated during 20 seconds by the program with the process ID 87741, as well as the location information of the source code of the function calls. The data is saved in a file, which is named perf.data by default.

  • sudo perf script -F ip,sym -f > sample.viz

    • sudo: You need to execute this command as the root user.
    • perf script: extracts original event records from the data file named perf.data by default, which was recorded earlier.
    • -F ip,sym: the output format.
    • -f: the text flow format, which is the default format.
    • > sample.viz: redirects the output to the sample.viz file.

To learn about the execution process of obdiag, you can run the obdiag gather perf -v command to view the detailed obdiag logs.

References

For more information about obdiag, see obdiag documentation.

Previous topic

Best practices for end-to-end tracing
Last

Next topic

Best practices for using obdiag to collect diagnostic information of parallel and slow SQL statements
Next
What is on this page
Applicable version
Procedure
Step 1: Install and deploy obdiag
Step 2: Configure information about the specified cluster
Step 3: Collect performance information for diagnosis
Step 4: Visualize collected performance data
Working mechanism
How a flame graph is generated
How a pprof graph is generated
References