Locate and analyze slow queries (II)|V4.3.0| docs|Distributed Database

Locate and analyze slow queries (II)

Last Updated：2025-12-02 02:51:29 Updated

In a TP business, all SQL statements are short SQL statements with execution times measured in milliseconds, but with a high frequency. It is difficult to identify "slow" queries based on the execution time for such queries. For frequent short queries, the key is to calculate the number of rows read by the query.

Example scenario

If the IDX_C1 index is used for execution, the query needs to read a few rows each time, with the execution time measured in microseconds or milliseconds. If a full table scan is performed, the query needs to read a large amount of data each time, with the execution time measured in tens to hundreds of milliseconds.

A query that takes only a few milliseconds each time is not necessarily a slow query. However, if the frequency of such a query is very high, a large amount of computing resources will be consumed in total. For example, the CPU can be overloaded.

CREATE TABLE T1 (C1 INT, C2 INT);
CREATE INDEX IDX_C1 ON T1 (C1);

// insert 10K rows into T1 and C1 is almost unique

SELECT * FROM T1 WHERE C1 = 1;

Locate frequent short queries

To locate such slow queries, you need to perform some aggregate statistics on the resource consumption of similar queries. In OceanBase Database, you can use the GV$OB_SQL_AUDIT view for this purpose. A sample diagnostic SQL statement is as follows:

The following SQL statement statistics the top 10 requests sorted by the number of rows read. If the top several requests have significantly higher row read counts than the others, optimizing these requests has high value. If you optimize the execution plans of these requests, you can achieve substantial overall gains.

During the actual diagnosis, the row read count is not the only factor that can cause a high CPU usage. You can also sort the queries by other fields, such as the total CPU time, the number of retries (retry_cnt), and the number of RPCs (rpc_count), for further analysis. Identify potential slow queries by analyzing the queries from multiple dimensions.

select /*+READ_CONSISTENCY(WEAK), QUERY_TIMEOUT(100000000)*/  svr_ip, sql_id,
    tenant_id, tenant_name, user_name, db_name, plan_id,
    count(*) exections,
    max(event) event,
    max(table_scan) table_scan,
    sum(case when ret_code = 0 then 0 else 1 end) fail_times,
    sum(rpc_count) rpc_count,
    sum(retry_cnt) retry_cnt,
    sum(case when plan_type = 2 then 1 else 0 end)
    remote_plans,
    sum(case when is_hit_plan = 1 then 0 else 1 end) miss_plans,
    round(avg(elapsed_time)) elapsed_time,
    round(max(elapsed_time)) max_elapsed_time,
    round(avg(execute_time)) execute_time,
    round(avg(( execute_time - total_wait_time_micro + get_plan_time ))) cpu_time,
    round(max(( execute_time - total_wait_time_micro + get_plan_time ))) max_cpu_time,
    round(avg(queue_time)) queue_time,
    round(avg(net_wait_time)) netwait_time,
    round(avg(user_io_wait_time)) iowait_time,
    round(avg(get_plan_time)) getplan_time,
    round(avg(decode_time)) decode_time,
    round(avg(total_wait_time_micro)) total_wait_time,
    round(avg(application_wait_time)) app_wait_time,
    round(avg(concurrency_wait_time)) concurrency_wait_time,
    round(avg(schedule_time)) schedule_time,
    round(avg(return_rows)) return_rows,
    round(avg(affected_rows)) affected_rows,
    round(avg(row_cache_hit * 2 + bloom_filter_cache_hit * 2 + block_cache_hit +
    disk_reads)) logical_reads,
    round(avg(row_cache_hit)) row_cache_hit,
    round(avg(bloom_filter_cache_hit)) bloom_filter_cache_hit,
    round(avg(block_cache_hit)) block_cache_hit,
    round(avg(disk_reads)) disk_reads,
    round(avg(memstore_read_row_count)) memstore_read_row_count,
    round(avg(ssstore_read_row_count)) ssstore_read_row_count,
    sum(memstore_read_row_count + ssstore_read_row_count) as total_row_count
    from   gv$ob_sql_audit
    where  is_inner_sql = 0
    group  by svr_ip, sql_id order by total_row_count desc limit 10;

View information about frequent short queries

After you locate the short SQL statements, you can search for the execution plans of the SQL statements.

Run the following command to obtain the execution plans of the SQL statements.

In the following query, specify the tenant_id based on the business to be analyzed, and fill in the svr_ip and sql_id based on the results of locating slow SQL statements.

SELECT tenant_id,
   svr_ip,
   svr_port,
   sql_id,
   plan_id,
   last_active_time,
   first_load_time,
   outline_data
FROM GV$OB_PLAN_CACHE_PLAN_STAT
WHERE TENANT_ID = 1002
AND SQL_ID = '3310A1D1D81D4BA92CEEF42538136DD1'
AND SVR_IP = '11.xxx.x.xx'
AND SVR_PORT = 35046;

If the preceding command succeeds, you will find the execution plan ID (plan_id), the time when the execution plan was first generated (first_load_time), and the time when the execution plan was last used (last_active_time). For batch processing, last_active_time is very likely the time when the slow query was initiated.

outline_data is a hint that describes the execution plan.

*************************** 1. row ***************************
    tenant_id: 1002
        svr_ip: 11.xxx.x.xx
        svr_port: 35046
        sql_id: 3310A1D1D81D4BA92CEEF42538136DD1
        plan_id: 741
last_active_time: 2022-08-04 11:00:34.466037
first_load_time: 2022-08-04 11:00:34.466037
    outline_data: /*+ BEGIN_OUTLINE_DATA FULL(@"SEL$1" "test.t1"@"SEL$1") END_OUTLINE_DATA*/

Obtain the form of the execution plan.

Run the following command to obtain the tree-like form of the execution plan.

SELECT OPERATOR, NAME, ROWS, COST FROM GV$PLAN_CACHE_PLAN_EXPLAIN
WHERE TENANT_ID = 1002 AND
    SVR_IP = '11.xxx.x.xx' AND
    SVR_PORT = 35046 AND
    PLAN_ID = 741 ;

Below is the physical plan returned by the preceding query.

+----------------+------+------+------+
| OPERATOR       | NAME | ROWS | COST |
+----------------+------+------+------+
| PHY_TABLE_SCAN | t1   |    1 |   45 |
+----------------+------+------+------+

In addition to the physical plan, you can also request the logical plan. The method is simple. Just execute the EXPLAIN statement for the SQL query.

explain select sleep(1000), rand() from t1;

*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |46  |
===================================

Outputs & filters:
-------------------------------------
0 - output([sleep(?)], [rand()]), filter(nil),
    access([t1.__pk_increment]), partitions(p0)