This article applies to scenarios where obdiag is deployed independently. Use the obdiag analyze flt_trace command to analyze OceanBase's full-link logs and provide a full-link diagnostic report.
If it is a cluster deployed by obd, you can directly collect information on the selected cluster through the obd side diagnostic command group.
What is full-link diagnosis: The OceanBase database is a distributed database, so the calling link is complex. When a timeout problem occurs, it is often impossible to quickly locate the problem with the internal components of OceanBase or the network. Operation and maintenance personnel can only analyze it based on experience and observer logs. The OB kernel has added trace.log log in 4.0, which can be used to analyze full-link diagnosis. There are two paths in the full link. One is from the application to send a request to the ODP (proxy server) through the client (JDBC or OCI, etc.) to access the OBServer, and the access result is returned to the application; the other is from the application to directly access the OBServer through the client (JDBC or OCI, etc.), and the access result is returned to the application. Full-link diagnosis is a diagnosis that locates problems in all components of the full link.
What does obdiag full-link diagnosis do:
┌────────┐ ┌─────────────────────────────┐ ┌────────────────────────────┐
| Server1 |------>| Search/filter flt_trace_id logs |------>| Copy filtered logs to obdiag node |---┐
└────────┘ └─────────────────────────────┘ └────────────────────────────┘ │
┌────────┐ ┌─────────────────────────────┐ ┌────────────────────────────┐ │ ┌────────────────────────────────┐
| Server2 |------>| Search/filter flt_trace_id logs |------>| Copy filtered logs to obdiag node |---|--> Aggregate logs by parent/child into trace tree |
└────────┘ └─────────────────────────────┘ └────────────────────────────┘ │ └────────────────────────────────┘
┌────────┐ ┌─────────────────────────────┐ ┌────────────────────────────┐ │
| ServerN |------>| Search/filter flt_trace_id logs |------>| Copy filtered logs to obdiag node |---┘
└────────┘ └─────────────────────────────┘ └────────────────────────────┘
```<main id="notice" type='explain'>
<h4>Description</h4>
<p>As of V4.1.0, <code>obdiag analyze flt_trace</code> supports automatic filtering of OBProxy nodes by client IP. During the analysis process, the client IP related to flt_trace_id is first obtained from the SQL audit view (<code>gv$ob_sql_audit</code> or <code>gv$sql_audit</code>), and then only logs are collected from matching OBProxy nodes to improve analysis efficiency and reduce unnecessary data collection. This feature is compatible with multi-version OceanBase databases. </p>
</main>
## Instructions for use
```bash
obdiag analyze flt_trace [options]
The options are explained below:
Option name |
Is it required |
Data type |
Default value |
Description |
|---|---|---|---|---|
| --flt_trace_id | Yes | string | flt_trace_id | flt_trace_id View flt_trace_id from gv$ob_sql_audit or directly through trace.log |
| --store_dir | No | string | Defaults to the current path where the command is executed | The local path where the results are stored. |
| --temp_dir | No | string | /tmp |
The temporary file storage directory generated by the remote node during the log collection process. |
| --files | No | string | Default is empty | --files is the offline analysis log file mode. Once --files is specified, it will enter the offline analysis log mode |
| --top | No | string | 5 | Display the Top N leaf nodes that take the longest time in the full-link diagnosis results |
| --recursion | No | string | 8 | The number of layers of maximum recursive full-link diagnostics |
| --output | No | string | 60 | The number of result lines printed to the screen. All results will be completely stored in the result file |
| -c | No | string | ~/.obdiag/config.yml |
Configuration file path |
| --inner_config | No | string | Default is empty | obdiag Self-use configuration |
| --config | No | string | Default is empty | Configuration of the cluster to be diagnosed by obdiag, fixed style: --config key1=value1 --config key2=value2 |
| --config_password | No | string | Default is empty | obdiag When using an encrypted configuration file, you need to pass in the corresponding password through this option.
ExplanationFor details, please see Configuration file encryption. |
Usage example
Method 1: Use without configuration file (out of the box)
Step 1: Find suspected slow SQL
In gv$ob_sql_audit, if there is a clear SQL statement, you can use query_sql to find the flt_trace_id of suspected slow SQL, for example:
OceanBase(root@test)>select query_sql, flt_trace_id from oceanbase.gv$ob_sql_audit where query_sql like 'select @@version_comment limit 1';
+----------------------------------+--------------------------------------+
| query_sql | flt_trace_id |
+----------------------------------+--------------------------------------+
| select @@version_comment limit 1 | 00060aa3-d607-f5f2-328b-388e17f687cb |
+----------------------------------+--------------------------------------+
1 row in set (0.001 sec)
Where flt_trace_id is 00060aa3-d607-f5f2-328b-388e17f687cb.
Or you can also find flt_trace_id from the trace.log log of OBProxy and OceanBase database:
head trace.log
[2023-12-07 22:20:07.242229] [489640][T1_L0_G0][T1][YF2A0BA2DA7E-00060BEC28627BEF-0-0] {"trace_id":"00060bec-275e-9832-e730-7c129f2182ac","name":"close_das_task","id":"00060bec-2a20-bf9e-56c9-724cb467f859","start_ts":1701958807240606,"end_ts":1701958807240607,"parent_id":"00060bec-2a20-bb5f-e03a-5da01aa3308b","is_follow":false}
Among them 00060bec-275e-9832-e730-7c129f2182ac is its flt_trace_id.
Step 2: Execute full-link diagnostic command
Description
For detailed configuration introduction used in the commands in this section, please refer to obdiag configuration.
$ obdiag analyze flt_trace --flt_trace_id 000605b1-28bb-c15f-8ba0-1206bcc08aa3 \
--config obcluster.servers.nodes[0].ip=xx.xx.xx.1 \
--config obcluster.servers.nodes[1].ip=xx.xx.xx.2 \
--config obcluster.servers.nodes[2].ip=xx.xx.xx.3 \
--config obcluster.servers.global.home_path=/home/admin/oceanbase \
--config obproxy.servers.nodes[0].ip=xx.xx.xx.4 \
--config obproxy.servers.nodes[1].ip=xx.xx.xx.5 \
--config obproxy.servers.global.ssh_username=test \
--config obproxy.servers.global.ssh_password=****** \
--config obproxy.servers.global.home_path=/home/admin/obproxy
root node id: 000605b1-28bb-c15f-8ba0-1206bcc08aa3
TOP time-consuming leaf span:
+---+----------------------------------+-------------+---------------------+
| ID| Span Name | Elapsed Time| NODE |
+---+----------------------------------+-------------+---------------------+
| 18| px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)|
| 5 | pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)|
| 16| do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.1)|
| 10| do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)|
| 17| close_das_task | 14 μs | OBSERVER(xx.xx.xx.1)|
+---+----------------------------------+-------------+---------------------+
Tags & Logs:
-------------------------------------
18 - px_task Elapsed: 2.758 ms
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 1}]
5 - pc_get_plan Elapsed: 52 μs
NODE:OBSERVER(xx.xx.xx.1)
16 - do_local_das_task Elapsed: 45 μs
NODE:OBSERVER(xx.xx.xx.3)
10 - do_local_das_task Elapsed: 17 μs
NODE:OBSERVER(xx.xx.xx.1)
17 - close_das_task Elapsed: 14 μs
NODE:OBSERVER(xx.xx.xx.3)
Details:
+---+----------------------------------+-------------+---------------------+
| ID| Span Name | Elapsed Time| NODE |
+---+----------------------------------+-------------+---------------------+
| 1 | TRACE | - | - |
| 2 | └─com_query_process | 5.351 ms | OBPROXY(xx.xx.xx.1) |
| 3 | └─mpquery_single_stmt | 5.333 ms | OBSERVER(xx.xx.xx.1)|
| 4 | ├─sql_compile | 107 μs | OBSERVER(xx.xx.xx.1)|
| 5 | │ └─pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)|
| 6 | └─sql_execute | 5.147 ms | OBSERVER(xx.xx.xx.1)|
| 7 | ├─open | 87 μs | OBSERVER(xx.xx.xx.1)|
| 8 | ├─response_result | 4.945 ms | OBSERVER(xx.xx.xx.1)|
| 9 | │ ├─px_schedule | 2.465 ms | OBSERVER(xx.xx.xx.1)|
| 10| │ │ ├─do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)|
| 11| │ │ ├─px_task | 2.339 ms | OBSERVER(xx.xx.xx.2)|
| 12| │ │ │ ├─do_local_das_task | 54 μs | OBSERVER(xx.xx.xx.2)|
| 13| │ │ │ └─close_das_task | 22 μs | OBSERVER(xx.xx.xx.2)|
| 14| │ │ ├─do_local_das_task | 11 μs | OBSERVER(xx.xx.xx.1)|
| 15| │ │ ├─px_task | 2.834 ms | OBSERVER(xx.xx.xx.3)|
| 16| │ │ │ ├─do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.3)|
| 17| │ │ │ └─close_das_task | 14 μs | OBSERVER(xx.xx.xx.3)|
| 18| │ │ └─px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)|
| 19| │ ├─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)|
| 20| │ └─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)|
| ..| ...... | ... | ...... |
+---+----------------------------------+-------------+---------------------+
For more details, please run cmd ' cat analyze_flt_result/000605b1-28bb-c15f-8ba0-1206bcc08aa3.txt '
check the details:
$ cat analyze_flt_result/000605b1-28bb-c15f-8ba0-1206bcc08aa3.txt
root node id: 000605b1-28bb-c15f-8ba0-1206bcc08aa3
TOP time-consuming leaf span:
+---+----------------------------------+-------------+---------------------+
| ID| Span Name | Elapsed Time| NODE |
+---+----------------------------------+-------------+---------------------+
| 18| px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)|
| 5 | pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)|
| 16| do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.1)|
| 10| do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)|
| 17| close_das_task | 14 μs | OBSERVER(xx.xx.xx.1)|
+---+----------------------------------+-------------+---------------------+
Tags & Logs:
-------------------------------------
18 - px_task Elapsed: 2.758 ms
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 1}]
5 - pc_get_plan Elapsed: 52 μs
NODE:OBSERVER(xx.xx.xx.1)
16 - do_local_das_task Elapsed: 45 μs
NODE:OBSERVER(xx.xx.xx.3)
10 - do_local_das_task Elapsed: 17 μs
NODE:OBSERVER(xx.xx.xx.1)
17 - close_das_task Elapsed: 14 μs
NODE:OBSERVER(xx.xx.xx.3)
Details:
+---+----------------------------------+-------------+---------------------+
| ID| Span Name | Elapsed Time| NODE |
+---+----------------------------------+-------------+---------------------+
| 1 | TRACE | - | - |
| 2 | └─com_query_process | 5.351 ms | OBPROXY(xx.xx.xx.1) |
| 3 | └─mpquery_single_stmt | 5.333 ms | OBSERVER(xx.xx.xx.1)|
| 4 | ├─sql_compile | 107 μs | OBSERVER(xx.xx.xx.1)|
| 5 | │ └─pc_get_plan | 52 μs | OBSERVER(xx.xx.xx.1)|
| 6 | └─sql_execute | 5.147 ms | OBSERVER(xx.xx.xx.1)|
| 7 | ├─open | 87 μs | OBSERVER(xx.xx.xx.1)|
| 8 | ├─response_result | 4.945 ms | OBSERVER(xx.xx.xx.1)|
| 9 | │ ├─px_schedule | 2.465 ms | OBSERVER(xx.xx.xx.1)|
| 10| │ │ ├─do_local_das_task | 17 μs | OBSERVER(xx.xx.xx.1)|
| 11| │ │ ├─px_task | 2.339 ms | OBSERVER(xx.xx.xx.2)|
| 12| │ │ │ ├─do_local_das_task | 54 μs | OBSERVER(xx.xx.xx.2)|
| 13| │ │ │ └─close_das_task | 22 μs | OBSERVER(xx.xx.xx.2)|
| 14| │ │ ├─do_local_das_task | 11 μs | OBSERVER(xx.xx.xx.1)|
| 15| │ │ ├─px_task | 2.834 ms | OBSERVER(xx.xx.xx.3)|
| 16| │ │ │ ├─do_local_das_task | 45 μs | OBSERVER(xx.xx.xx.3)|
| 17| │ │ │ └─close_das_task | 14 μs | OBSERVER(xx.xx.xx.3)|
| 18| │ │ └─px_task | 2.758 ms | OBSERVER(xx.xx.xx.1)|
| 19| │ ├─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)|
| 20| │ └─px_schedule | 1 μs | OBSERVER(xx.xx.xx.1)|
| 21| └─close | 70 μs | OBSERVER(xx.xx.xx.1)|
| 22| └─end_transaction | 3 μs | OBSERVER(xx.xx.xx.1)|
+---+----------------------------------+-------------+---------------------+
Tags & Logs:
-------------------------------------
1 -
2 - com_query_process Elapsed: 5.351 ms
NODE:OBPROXY(xx.xx.xx.1)
tags: [{'sess_id': 3221487633}, {'action_name': ''}, {'module_name': ''}, {'client_info': ''}, {'receive_ts': 1695108311007659}, {'log_trace_id': 'YA9257F000001-000605B0441954BC-0-0'}]
3 - mpquery_single_stmt Elapsed: 5.333 ms
NODE:OBSERVER(xx.xx.xx.1)
4 - sql_compile Elapsed: 107 μs
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'sql_text': 'select /*+parallel(2)*/ count(1) from t1 tt1, t1 tt2'}, {'sql_id': '797B7202BA69D4C2C77C12BFADDC19DC'}, {'database_id': 201001}, {'plan_hash': 150629045171310866}, {'hit_plan': True}]
5 - pc_get_plan Elapsed: 52 μs
NODE:OBSERVER(xx.xx.xx.1)
6 - sql_execute Elapsed: 5.147 ms
NODE:OBSERVER(xx.xx.xx.1)
7 - open Elapsed: 87 μs
NODE:OBSERVER(xx.xx.xx.1)
8 - response_result Elapsed: 4.945 ms
NODE:OBSERVER(xx.xx.xx.1)
9 - px_schedule Elapsed: 2.465 ms
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'used_worker_cnt': 0}, {'qc_id': 1}, {'dfo_id': 2147483647}, {'used_worker_cnt': 0}, {'qc_id': 1}, {'dfo_id': 1}]
10 - do_local_das_task Elapsed: 17 μs
NODE:OBSERVER(xx.xx.xx.1)
11 - px_task Elapsed: 2.339 ms
NODE:OBSERVER(xx.xx.xx.2)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 0}, {'task_id': 0}]
12 - do_local_das_task Elapsed: 54 μs
NODE:OBSERVER(xx.xx.xx.2)
13 - close_das_task Elapsed: 22 μs
NODE:OBSERVER(xx.xx.xx.2)
14 - do_local_das_task Elapsed: 11 μs
NODE:OBSERVER(xx.xx.xx.1)
15 - px_task Elapsed: 2.834 ms
NODE:OBSERVER(xx.xx.xx.3)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 0}]
16 - do_local_das_task Elapsed: 45 μs
NODE:OBSERVER(xx.xx.xx.3)
17 - close_das_task Elapsed: 14 μs
NODE:OBSERVER(xx.xx.xx.3)
18 - px_task Elapsed: 2.758 ms
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'group_id': 0}, {'qc_id': 1}, {'sqc_id': 0}, {'dfo_id': 1}, {'task_id': 1}]
19 - px_schedule Elapsed: 1 μs
NODE:OBSERVER(xx.xx.xx.1)
20 - px_schedule Elapsed: 1 μs
NODE:OBSERVER(xx.xx.xx.1)
21 - close Elapsed: 70 μs
NODE:OBSERVER(xx.xx.xx.1)
22 - end_transaction Elapsed: 3 μs
NODE:OBSERVER(xx.xx.xx.1)
tags: [{'trans_id': 0}]
Method 2: Use with configuration file
Step 1: Find suspected slow sql
In gv$ob_sql_audit, if there is a clear SQL statement, you can use query_sql to find the flt_trace_id of suspected slow SQL.
Step 2: Set the diagnosed cluster configuration
You need to ensure that the login information of the node to be collected has been configured in the obdiag configuration file config.yml. For related detailed configuration introduction, see obdiag configuration.
obdiag config -hxx.xx.xx.xx -uroot@sys -p***** -P2881
Step 3: Execute full-link diagnostic command
obdiag analyze flt_trace --flt_trace_id <FLT_TRACE_ID>
