In a transaction processing (TP) business scenario, SQL queries are often short, taking only milliseconds to execute, yet they are executed at an extremely high frequency. Determining whether a short query is slow based on its execution time is not straightforward. When identifying a slow short query, the crucial factor is to calculate the number of rows read in a query, rather than simply observing its absolute execution time.
Sample scenario
In the following query, if the IDX_C1 index is used, each execution needs to read a very small number of rows, and therefore takes only hundreds of microseconds or several milliseconds. However, if the query scans the whole table, each execution needs to read a large amount of data, and therefore takes tens of milliseconds to hundreds of milliseconds.
Simply from the perspective of execution time, the query is not slow. However, if the query is executed at a quite high frequency, it still consumes a lot of computing resources and may even use up the CPU resources.
CREATE TABLE T1 (C1 INT, C2 INT);
CREATE INDEX IDX_C1 ON T1 (C1);
// Insert 10,000 rows into T1 and C1 is almost unique.
SELECT * FROM T1 WHERE C1 = 1;
Locate a slow short query
To locate a slow short query, you need to collect aggregate statistics to calculate the overall resource consumption of a certain type of queries. In OceanBase Database, you can use the gv$sql_audit view for location. A typical diagnostic process for locating a slow short query is as follows:
The following statement obtains the top 10 queries of a certain type that read the most number of rows. If the numbers of rows read by the top queries are significantly higher by one to two orders of magnitude, their optimization will bring a great performance improvement. That is, it is very worthwhile to optimize the execution plans of the queries.
In the actual diagnostic process, the number of rows read is not the only metric that causes high CPU utilization. You can also try to sort and analyze queries by other metrics, such as the total CPU time, the number of retries (retry_cnt), and the number of RPCs (rpc_count). You can finally locate possible slow queries based on multi-dimension analysis.
select /*+READ_CONSISTENCY(WEAK), QUERY_TIMEOUT(100000000)*/ svr_ip, sql_id,
tenant_id, tenant_name, user_name, db_name, plan_id,
count(*) exections,
max(event) event,
max(table_scan) table_scan,
sum(case when ret_code = 0 then 0 else 1 end) fail_times,
sum(rpc_count) rpc_count,
sum(retry_cnt) retry_cnt,
sum(case when plan_type = 2 then 1 else 0 end)
remote_plans,
sum(case when is_hit_plan = 1 then 0 else 1 end) miss_plans,
round(avg(elapsed_time)) elapsed_time,
round(max(elapsed_time)) max_elapsed_time,
round(avg(execute_time)) execute_time,
round(avg(( execute_time - total_wait_time_micro + get_plan_time ))) cpu_time,
round(max(( execute_time - total_wait_time_micro + get_plan_time ))) max_cpu_time,
round(avg(queue_time)) queue_time,
round(avg(net_wait_time)) netwait_time,
round(avg(user_io_wait_time)) iowait_time,
round(avg(get_plan_time)) getplan_time,
round(avg(decode_time)) decode_time,
round(avg(total_wait_time_micro)) total_wait_time,
round(avg(application_wait_time)) app_wait_time,
round(avg(concurrency_wait_time)) concurrency_wait_time,
round(avg(schedule_time)) schedule_time,
round(avg(return_rows)) return_rows,
round(avg(affected_rows)) affected_rows,
round(avg(row_cache_hit * 2 + bloom_filter_cache_hit * 2 + block_cache_hit +
disk_reads)) logical_reads,
round(avg(row_cache_hit)) row_cache_hit,
round(avg(bloom_filter_cache_hit)) bloom_filter_cache_hit,
round(avg(block_cache_hit)) block_cache_hit,
round(avg(disk_reads)) disk_reads,
round(avg(memstore_read_row_count)) memstore_read_row_count,
round(avg(ssstore_read_row_count)) ssstore_read_row_count,
sum(memstore_read_row_count + ssstore_read_row_count) as total_row_count
from gv$sql_audit
where is_inner_sql = 0
group by svr_ip, sql_id order by total_row_count desc limit 10;
Obtain the execution plan of a slow short query
You can obtain information about the execution plan for a preceding slow short SQL query based on the query information returned.
Execute the following statement to obtain information about the execution plan of a slow short SQL query.
Use the actual tenant ID for
tenant_idand specifysvr_ipandsql_idbased on the previously returned information about the slow short SQL query.SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id, last_active_time, first_load_time, outline_data FROM GV$PLAN_CACHE_PLAN_STAT WHERE TENANT_ID = 1002 AND SQL_ID = '3310A1D1D81D4BA92CEEF42538136DD1' AND SVR_IP = '11.xxx.x.xx' AND SVR_PORT = 35046;The following result is returned, which displays information about the execution plan for the query, including the plan ID (
plan_id), time when the plan was first generated (first_load_time), and time when the plan was last used (last_active_time). For batch processing,last_active_timeis very likely the time when the slow query was initiated.Here,
outline_datais the hint description of the execution plan.*************************** 1. row *************************** tenant_id: 1002 svr_ip: 11.xxx.x.xx svr_port: 35046 sql_id: 3310A1D1D81D4BA92CEEF42538136DD1 plan_id: 741 last_active_time: 2022-08-04 11:00:34.466037 first_load_time: 2022-08-04 11:00:34.466037 outline_data: /*+ BEGIN_OUTLINE_DATA FULL(@"SEL$1" "test.t1"@"SEL$1") END_OUTLINE_DATA*/Obtain the tree-shaped execution plan for the slow SQL query.
Execute the following statement:
SELECT OPERATOR, NAME, ROWS, COST FROM GV$PLAN_CACHE_PLAN_EXPLAIN WHERE TENANT_ID = 1002 AND SVR_IP = '11.xxx.x.xx' AND SVR_PORT = 35046 AND PLAN_ID = 741 ;The physical plan for the query is as follows:
+----------------+------+------+------+ | OPERATOR | NAME | ROWS | COST | +----------------+------+------+------+ | PHY_TABLE_SCAN | t1 | 1 | 45 | +----------------+------+------+------+You can also try to obtain the logical plan by using the
EXPLAINstatement.explain select sleep(1000), rand() from t1; *************************** 1. row *************************** Query Plan: =================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ----------------------------------- |0 |TABLE SCAN|t1 |1 |46 | =================================== Outputs & filters: ------------------------------------- 0 - output([sleep(?)], [rand()]), filter(nil), access([t1.__pk_increment]), partitions(p0)