In a TP business, SQL queries are short and the execution time of a single SQL query is very short (in the range of microseconds or milliseconds). However, the frequency of SQL execution is very high. For such queries, it is difficult to identify "slow" queries based on execution time. For frequent short queries, the key is to calculate the number of rows read by a query.
Example scenario
If a query walks the IDX_C1 index, it needs to read a very small number of rows, with the execution time in the range of microseconds or milliseconds. However, if a full table scan is performed, it needs to read a large amount of data, with the execution time in the range of tens to hundreds of milliseconds.
A query may not be slow in absolute execution time. However, if it is frequently executed, it will consume a large amount of computing resources in total and even overload the CPU.
CREATE TABLE T1 (C1 INT, C2 INT);
CREATE INDEX IDX_C1 ON T1 (C1);
// insert 10K rows into T1 and C1 is almost unique
SELECT * FROM T1 WHERE C1 = 1;
Locate short queries
To locate such slow queries, you need to perform some aggregate statistics on the resource consumption of similar queries. In OceanBase Database, you can use the GV$OB_SQL_AUDIT view to identify these slow queries. A typical diagnostic SQL query statement is as follows:
This query statement statistics the top 10 requests by row read, sorted in descending order. If the top several requests have significantly higher row reads by one or two orders of magnitude, optimizing these requests has high value. If you optimize the execution plans of these requests, you can achieve significant overall gains.
During the actual diagnosis, CPU usage can be high due to factors other than row reads. Therefore, you can sort the queries by other fields, such as total CPU time, number of retries (retry_cnt), and number of RPC calls (rpc_count), for further analysis. By comprehensively analyzing queries from multiple dimensions, you can identify potential slow queries.
select /*+READ_CONSISTENCY(WEAK), QUERY_TIMEOUT(100000000)*/ svr_ip, sql_id,
tenant_id, tenant_name, user_name, db_name, plan_id,
count(*) exections,
max(event) event,
max(table_scan) table_scan,
sum(case when ret_code = 0 then 0 else 1 end) fail_times,
sum(rpc_count) rpc_count,
sum(retry_cnt) retry_cnt,
sum(case when plan_type = 2 then 1 else 0 end)
remote_plans,
sum(case when is_hit_plan = 1 then 0 else 1 end) miss_plans,
round(avg(elapsed_time)) elapsed_time,
round(max(elapsed_time)) max_elapsed_time,
round(avg(execute_time)) execute_time,
round(avg(( execute_time - total_wait_time_micro + get_plan_time ))) cpu_time,
round(max(( execute_time - total_wait_time_micro + get_plan_time ))) max_cpu_time,
round(avg(queue_time)) queue_time,
round(avg(net_wait_time)) netwait_time,
round(avg(user_io_wait_time)) iowait_time,
round(avg(get_plan_time)) getplan_time,
round(avg(decode_time)) decode_time,
round(avg(total_wait_time_micro)) total_wait_time,
round(avg(application_wait_time)) app_wait_time,
round(avg(concurrency_wait_time)) concurrency_wait_time,
round(avg(schedule_time)) schedule_time,
round(avg(return_rows)) return_rows,
round(avg(affected_rows)) affected_rows,
round(avg(row_cache_hit * 2 + bloom_filter_cache_hit * 2 + block_cache_hit +
disk_reads)) logical_reads,
round(avg(row_cache_hit)) row_cache_hit,
round(avg(bloom_filter_cache_hit)) bloom_filter_cache_hit,
round(avg(block_cache_hit)) block_cache_hit,
round(avg(disk_reads)) disk_reads,
round(avg(memstore_read_row_count)) memstore_read_row_count,
round(avg(ssstore_read_row_count)) ssstore_read_row_count,
sum(memstore_read_row_count + ssstore_read_row_count) as total_row_count
from gv$ob_sql_audit
where is_inner_sql = 0
group by svr_ip, sql_id order by total_row_count desc limit 10;
Analyze short queries
After you find the short SQL queries, you can search for the execution plans of these queries.
Run the following command to retrieve the execution plans of the short SQL queries.
In the following query,
tenant_idis specified based on the business to be analyzed, andsvr_ipandsql_idare specified based on the results of finding slow SQL queries.SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id, last_active_time, first_load_time, outline_data FROM GV$OB_PLAN_CACHE_PLAN_STAT WHERE TENANT_ID = 1002 AND SQL_ID = '3310A1D1D81D4BA92CEEF42538136DD1' AND SVR_IP = '11.xxx.x.xx' AND SVR_PORT = 35046;If the preceding command succeeds, you will find the execution plan ID (
plan_id), the time when the execution plan was generated for the first time (first_load_time), and the time when the execution plan was last used (last_active_time). For batch processing,last_active_timeis very likely the time when the slow query was initiated.outline_datais a hint description of the execution plan.*************************** 1. row *************************** tenant_id: 1002 svr_ip: 11.xxx.x.xx svr_port: 35046 sql_id: 3310A1D1D81D4BA92CEEF42538136DD1 plan_id: 741 last_active_time: 2022-08-04 11:00:34.466037 first_load_time: 2022-08-04 11:00:34.466037 outline_data: /*+ BEGIN_OUTLINE_DATA FULL(@"SEL$1" "test.t1"@"SEL$1") END_OUTLINE_DATA*/Obtain the plan form of slow SQL.
Run the following query to retrieve the tree-like structure of the execution plan.
SELECT OPERATOR, NAME, ROWS, COST FROM GV$PLAN_CACHE_PLAN_EXPLAIN WHERE TENANT_ID = 1002 AND SVR_IP = '11.xxx.x.xx' AND SVR_PORT = 35046 AND PLAN_ID = 741 ;Below is the physical plan returned by the preceding query.
+----------------+------+------+------+ | OPERATOR | NAME | ROWS | COST | +----------------+------+------+------+ | PHY_TABLE_SCAN | t1 | 1 | 45 | +----------------+------+------+------+In addition to the physical plan, you can also retrieve the logical plan. The method is simple. Just execute the
EXPLAINstatement for the SQL query.explain select sleep(1000), rand() from t1; *************************** 1. row *************************** Query Plan: =================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ----------------------------------- |0 |TABLE SCAN|t1 |1 |46 | =================================== Outputs & filters: ------------------------------------- 0 - output([sleep(?)], [rand()]), filter(nil), access([t1.__pk_increment]), partitions(p0)