If OceanBase Diagnostic Tool (obdiag) is independently deployed, you can run this command to analyze the trace logs of OceanBase Database and return an end-to-end diagnostics report.
If your OceanBase cluster is deployed by using OceanBase Deployer (obd), you can run obdiag commands on obd to collect diagnostic information of the cluster. For more information about the commands, see obdiag commands.
What is end-to-end diagnostics? OceanBase Database is a distributed database in which the call links are complex. When a timeout issue occurs, O&M engineers cannot quickly determine whether the issue is caused by OceanBase Database components or the network. They can only analyze the issue based on experience and OceanBase Database logs. OceanBase Database V4.0 provides the trace log feature for end-to-end diagnostics. Two paths are involved in end-to-end tracing. In one path, the application sends a request to ODP by using a client, such as Java Database Connectivity (JDBC) or Oracle Call Interface (OCI), to access an OBServer node. The access result is returned to the application. In the other path, the application directly accesses an OBServer node by using a client, and the access result is returned to the application. In end-to-end diagnostics, all components are diagnosed.
Working mechanism
┌────────┐ ┌─────────────────────────────┐ ┌────────────────────────────┐
│ Server 1 │------>│ Search and filter the logs related to the specified flt_trace_id. │------>│ Return filtered logs to the node where obdiag is deployed. │---┐
└────────┘ └─────────────────────────────┘ └────────────────────────────┘ │
┌────────┐ ┌─────────────────────────────┐ ┌────────────────────────────┐ │ ┌────────────────────────────────┐
│ Server 2 │------>│ Search and filter the logs related to the specified flt_trace_id. │------>│ Return filtered logs to the node where obdiag is deployed. │---┼--->│ Aggregate logs obtained from parent and child nodes and generate a trace tree based on the hierarchical relationship between nodes.
└────────┘ └─────────────────────────────┘ └────────────────────────────┘ │ └────────────────────────────────┘
┌────────┐ ┌─────────────────────────────┐ ┌────────────────────────────┐ │
│ Server N │------>│ Search and filter the logs related to the specified flt_trace_id. │------>│ Return filtered logs to the node where obdiag is deployed. │---┘
└────────┘ └─────────────────────────────┘ └────────────────────────────┘
Syntax
obdiag analyze flt_trace [options]
The following table describes the options.
| Option | Required? | Data type | Default value | Description |
|---|---|---|---|---|
| --flt_trace_id | Yes | string | flt_trace_id | The trace ID. You can obtain the value by querying the flt_trace_id field in the gv$ob_sql_audit view or searching the trace.log file. |
| --store_dir | No | string | The current path where the command is executed | The local path where the results are stored. |
| --temp_dir | No | string | /tmp |
The directory where temporary files generated by the remote node during log collection are stored. |
| --files | No | string | Empty | The files to be analyzed. If you specify the --files option, the offline log analysis mode is enabled. |
| --top | No | string | 5 | The top N leaf nodes that take the longest time in the result of the end-to-end diagnostics. |
| --recursion | No | string | 8 | The maximum number of layers for recursive end-to-end diagnostics. |
| --output | No | string | 60 | The number of displayed rows of the result. The full result is stored in the result file. |
| -c | No | string | ~/.obdiag/config.yml |
The path of the configuration file. |
| --inner_config | No | string | Empty | The configurations of obdiag. |
| --config | No | string | Empty | The configurations of the cluster diagnosed by obdiag, in the format of --config key1=value1 --config key2=value2. |
Examples
Method 1: Use the command out-of-the-box without a configuration file
Step 1: Find the ID of a suspected slow SQL statement
If you suspect that an SQL statement is slow, you can query the gv$ob_sql_audit view to obtain its flt_trace_id value. Here is an example:
OceanBase(root@test)>select query_sql, flt_trace_id from oceanbase.gv$ob_sql_audit where query_sql like 'select @@version_comment limit 1';
+----------------------------------+--------------------------------------+
| query_sql | flt_trace_id |
+----------------------------------+--------------------------------------+
| select @@version_comment limit 1 | 00060aa3-d607-f5f2-328b-388e17f687cb |
+----------------------------------+--------------------------------------+
1 row in set (0.001 sec)
The result indicates that the flt_trace_id value of the suspected SQL statement is 00060aa3-d607-f5f2-328b-388e17f687cb.
You can also obtain the flt_trace_id value by searching the trace.log file of ODP or OceanBase Database. Here is an example:
head trace.log
[2023-12-07 22:20:07.242229] [489640][T1_L0_G0][T1][YF2A0BA2DA7E-00060BEC28627BEF-0-0] {"trace_id":" 00060bec-275e-9832-e730-7c129f2182ac","name":" close_das_task","id":" 00060bec-2a20-bf9e-56c9-724cb467f859","start_ts":1701958807240606,"end_ts":1701958807240607,"parent_id":" 00060bec-2a20-bb5f-e03a-5da01aa3308b","is_follow":false}
The result indicates that the flt_trace_id value of the suspected SQL statement is 00060bec-275e-9832-e730-7c129f2182ac.
Step 2: Run the end-to-end diagnostics command
Note
For more information about the parameters used in the command in this section, see Configure obdiag.
$ obdiag analyze flt_trace --flt_trace_id 000605b1-28bb-c15f-8ba0-1206bcc08aa3 \
--config obcluster.servers.nodes[0].ip=xx.xx.xx.1 \
--config obcluster.servers.nodes[1].ip=xx.xx.xx.xx.2 \
--config obcluster.servers.nodes[2].ip=xx.xx.xx.xx.3 \
--config obcluster.servers.global.home_path=/home/admin/oceanbase \
--config obproxy.servers.nodes[0].ip=xx.xx.xx.4 \
--config obproxy.servers.nodes[1].ip=xx.xx.xx.xx.5 \
--config obproxy.servers.global.ssh_username=test \
--config obproxy.servers.global.ssh_password=****** \
--config obproxy.servers.global.home_path=/home/admin/obproxy
root node id: 000605b1-28bb-c15f-8ba0-1206bcc08aa3
TOP time-consuming leaf span:
+---+----------------------------------+-------------+---------------------+
| ID| Span Name | Elapsed Time| NODE |
+---+----------------------------------+-------------+---------------------+
| 18| px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)|
| 5 | pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)|
| 16| do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.1)|
| 10| do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)|
| 17| close_das_task | 14 μs | OBSERVER(xx.xx.xx.1)|
+---+----------------------------------+-------------+---------------------+
Tags & Logs:
-------------------------------------
18 - px_task Elapsed: 2.758 ms
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 1}]
5 - pc_get_plan Elapsed: 52 μs
NODE:OBSERVER(xx.xx.xx.1)
16 - do_local_das_task Elapsed: 45 μs
NODE:OBSERVER(xx.xx.xx.3)
10 - do_local_das_task Elapsed: 17 μs
NODE:OBSERVER(xx.xx.xx.1)
17 - close_das_task Elapsed: 14 μs
NODE:OBSERVER(xx.xx.xx.3)
Details:
+---+----------------------------------+-------------+---------------------+
| ID| Span Name | Elapsed Time| NODE |
+---+----------------------------------+-------------+---------------------+
| 1 | TRACE | - | - |
| 2 | └─com_query_process | 5.351 ms | OBPROXY(xx.xx.xx.1) |
| 3 | └─mpquery_single_stmt | 5.333 ms | OBSERVER(xx.xx.xx.1)|
| 4 | ├─sql_compile | 107 μs | OBSERVER(xx.xx.xx.1)|
| 5 | │ └─pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)|
| 6 | └─sql_execute | 5.147 ms | OBSERVER(xx.xx.xx.1)|
| 7 | ├─open | 87 μs | OBSERVER(xx.xx.xx.1)|
| 8 | ├─response_result | 4.945 ms | OBSERVER(xx.xx.xx.1)|
| 9 | │ ├─px_schedule | 2.465 ms | OBSERVER(xx.xx.xx.1)|
| 10| │ │ ├─do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)|
| 11| │ │ ├─px_task | 2.339 ms | OBSERVER(xx.xx.xx.2)|
| 12| │ │ │ ├─do_local_das_task | 54 μs | OBSERVER(xx.xx.xx.2)|
| 13| │ │ │ └─close_das_task | 22 μs | OBSERVER(xx.xx.xx.2)|
| 14| │ │ ├─do_local_das_task | 11 μs | OBSERVER(xx.xx.xx.1)|
| 15| │ │ ├─px_task | 2.834 ms | OBSERVER(xx.xx.xx.3)|
| 16| │ │ │ ├─do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.3)|
| 17| │ │ │ └─close_das_task | 14 μs | OBSERVER(xx.xx.xx.3)|
| 18| │ │ └─px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)|
| 19| │ ├─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)|
| 20| │ └─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)|
| ..| ...... | ... | ...... |
+---+----------------------------------+-------------+---------------------+
For more details, please run cmd ' cat analyze_flt_result/000605b1-28bb-c15f-8ba0-1206bcc08aa3.txt '
View the details:
$ cat analyze_flt_result/000605b1-28bb-c15f-8ba0-1206bcc08aa3.txt
root node id: 000605b1-28bb-c15f-8ba0-1206bcc08aa3
TOP time-consuming leaf span:
+---+----------------------------------+-------------+---------------------+
| ID| Span Name | Elapsed Time| NODE |
+---+----------------------------------+-------------+---------------------+
| 18| px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)|
| 5 | pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)|
| 16| do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.1)|
| 10| do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)|
| 17| close_das_task | 14 μs | OBSERVER(xx.xx.xx.1)|
+---+----------------------------------+-------------+---------------------+
Tags & Logs:
-------------------------------------
18 - px_task Elapsed: 2.758 ms
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 1}]
5 - pc_get_plan Elapsed: 52 μs
NODE:OBSERVER(xx.xx.xx.1)
16 - do_local_das_task Elapsed: 45 μs
NODE:OBSERVER(xx.xx.xx.3)
10 - do_local_das_task Elapsed: 17 μs
NODE:OBSERVER(xx.xx.xx.1)
17 - close_das_task Elapsed: 14 μs
NODE:OBSERVER(xx.xx.xx.3)
Details:
+---+----------------------------------+-------------+---------------------+
| ID| Span Name | Elapsed Time| NODE |
+---+----------------------------------+-------------+---------------------+
| 1 | TRACE | - | - |
| 2 | └─com_query_process | 5.351 ms | OBPROXY(xx.xx.xx.1) |
| 3 | └─mpquery_single_stmt | 5.333 ms | OBSERVER(xx.xx.xx.1)|
| 4 | ├─sql_compile | 107 μs | OBSERVER(xx.xx.xx.1)|
| 5 | │ └─pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)|
| 6 | └─sql_execute | 5.147 ms | OBSERVER(xx.xx.xx.1)|
| 7 | ├─open | 87 μs | OBSERVER(xx.xx.xx.1)|
| 8 | ├─response_result | 4.945 ms | OBSERVER(xx.xx.xx.1)|
| 9 | │ ├─px_schedule | 2.465 ms | OBSERVER(xx.xx.xx.1)|
| 10| │ │ ├─do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)|
| 11| │ │ ├─px_task | 2.339 ms | OBSERVER(xx.xx.xx.2)|
| 12| │ │ │ ├─do_local_das_task | 54 μs | OBSERVER(xx.xx.xx.2)|
| 13| │ │ │ └─close_das_task | 22 μs | OBSERVER(xx.xx.xx.2)|
| 14| │ │ ├─do_local_das_task | 11 μs | OBSERVER(xx.xx.xx.1)|
| 15| │ │ ├─px_task | 2.834 ms | OBSERVER(xx.xx.xx.3)|
| 16| │ │ │ ├─do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.3)|
| 17| │ │ │ └─close_das_task | 14 μs | OBSERVER(xx.xx.xx.3)|
| 18| │ │ └─px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)|
| 19| │ ├─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)|
| 20| │ └─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)|
| 21| └─close | 70 μs | OBSERVER(xx.xx.xx.1)|
| 22| └─end_transaction | 3 μs | OBSERVER(xx.xx.xx.1)|
+---+----------------------------------+-------------+---------------------+
Tags & Logs:
-------------------------------------
1 -
2 - com_query_process Elapsed: 5.351 ms
NODE:OBPROXY(xx.xx.xx.1)
tags: [{'sess_id': 3221487633}, {'action_name': ''}, {'module_name': ''}, {'client_info': ''}, {'receive_ts': 1695108311007659}, {'log_trace_id': 'YA9257F000001-000605B0441954BC-0-0'}]
3 - mpquery_single_stmt Elapsed: 5.333 ms
NODE:OBSERVER(xx.xx.xx.1)
4 - sql_compile Elapsed: 107 μs
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'sql_text': 'select /*+parallel(2)*/ count(1) from t1 tt1, t1 tt2'}, {'sql_id': '797B7202BA69D4C2C77C12BFADDC19DC'}, {'database_id': 201001}, {'plan_hash': 150629045171310866}, {'hit_plan': True}]
5 - pc_get_plan Elapsed: 52 μs
NODE:OBSERVER(xx.xx.xx.1)
6 - sql_execute Elapsed: 5.147 ms
NODE:OBSERVER(xx.xx.xx.1)
7 - open Elapsed: 87 μs
NODE:OBSERVER(xx.xx.xx.1)
8 - response_result Elapsed: 4.945 ms
NODE:OBSERVER(xx.xx.xx.1)
9 - px_schedule Elapsed: 2.465 ms
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'used_worker_cnt': 0}, {'qc_id': 1}, {'dfo_id': 2147483647}, {'used_worker_cnt': 0}, {'qc_id': 1}, {'dfo_id': 1}]
10 - do_local_das_task Elapsed: 17 μs
NODE:OBSERVER(xx.xx.xx.1)
11 - px_task Elapsed: 2.339 ms
NODE:OBSERVER(xx.xx.xx.2)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 0}, {'task_id': 0}]
12 - do_local_das_task Elapsed: 54 μs
NODE:OBSERVER(xx.xx.xx.2)
13 - close_das_task Elapsed: 22 μs
NODE:OBSERVER(xx.xx.xx.2)
14 - do_local_das_task Elapsed: 11 μs
NODE:OBSERVER(xx.xx.xx.1)
15 - px_task Elapsed: 2.834 ms
NODE:OBSERVER(xx.xx.xx.3)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 0}]
16 - do_local_das_task Elapsed: 45 μs
NODE:OBSERVER(xx.xx.xx.3)
17 - close_das_task Elapsed: 14 μs
NODE:OBSERVER(xx.xx.xx.3)
18 - px_task Elapsed: 2.758 ms
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 1}]
19 - px_schedule Elapsed: 1 μs
NODE:OBSERVER(xx.xx.xx.1)
20 - px_schedule Elapsed: 1 μs
NODE:OBSERVER(xx.xx.xx.1)
21 - close Elapsed: 70 μs
NODE:OBSERVER(xx.xx.xx.1)
22 - end_transaction Elapsed: 3 μs
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'trans_id': 0}]
Method 2: Use the command with a configuration file
Step 1: Find the ID of a suspected slow SQL statement
If you suspect that an SQL statement is slow, you can query the gv$ob_sql_audit view to obtain its flt_trace_id value.
Step 2: Configure the information of the cluster to be diagnosed
Before you run the command, make sure that you have configured the logon information of the target nodes in the config.yml configuration file of obdiag. For more information, see Configure obdiag.
obdiag config -hxx.xx.xx.xx -uroot@sys -p***** -P2881
Step 3: Run the end-to-end diagnostics command
obdiag analyze flt_trace --flt_trace_id <FLT_TRACE_ID>