In a TP business, SQL queries are short and frequent, with execution times measured in microseconds or milliseconds. Therefore, it's difficult to identify slow queries based on execution time. For frequent short queries, the key is to calculate the number of rows read by each query.
Scenarios
If the IDX_C1 index is used for execution, each execution of the following query needs to read a very small number of rows, with the execution time measured in microseconds or milliseconds. However, if a full table scan is performed, each execution needs to read a large amount of data, with the execution time measured in tens to hundreds of milliseconds.
A query that takes only a few microseconds or milliseconds to execute is generally not considered a slow query. However, if this query is frequently executed, a large amount of computing resources will be consumed overall, even exhausting the CPU resources.
CREATE TABLE T1 (C1 INT, C2 INT);
CREATE INDEX IDX_C1 ON T1 (C1);
// insert 10K rows into T1 and C1 is almost unique
SELECT * FROM T1 WHERE C1 = 1;
Locate frequent short queries
To locate such slow queries, you need to perform some aggregate statistics on the resource consumption of similar queries. In OceanBase Database, you can use the GV$OB_SQL_AUDIT view to identify these slow queries. A sample diagnostic SQL query statement is as follows:
The following query statistics rank the top 10 requests of a certain type of SQL by the number of rows read. If it is found that the overall number of rows read by the top few requests is significantly higher by 1 to 2 orders of magnitude, then the value of optimizing these requests will be very high. If the execution plan of these requests can be optimized, significant overall benefits can be achieved.
During the actual diagnosis, the number of rows read is not the only factor that can cause high CPU usage. You can also sort the queries by other fields, such as the total CPU time, the number of retries (retry_cnt), and the number of RPCs (rpc_count), to identify potential slow queries from multiple dimensions.
select /*+READ_CONSISTENCY(WEAK), QUERY_TIMEOUT(100000000)*/ svr_ip, sql_id,
tenant_id, tenant_name, user_name, db_name, plan_id,
count(*) exections,
max(event) event,
max(table_scan) table_scan,
sum(case when ret_code = 0 then 0 else 1 end) fail_times,
sum(rpc_count) rpc_count,
sum(retry_cnt) retry_cnt,
sum(case when plan_type = 2 then 1 else 0 end)
remote_plans,
sum(case when is_hit_plan = 1 then 0 else 1 end) miss_plans,
round(avg(elapsed_time)) elapsed_time,
round(max(elapsed_time)) max_elapsed_time,
round(avg(execute_time)) execute_time,
round(avg(( execute_time - total_wait_time_micro + get_plan_time ))) cpu_time,
round(max(( execute_time - total_wait_time_micro + get_plan_time ))) max_cpu_time,
round(avg(queue_time)) queue_time,
round(avg(net_wait_time)) netwait_time,
round(avg(user_io_wait_time)) iowait_time,
round(avg(get_plan_time)) getplan_time,
round(avg(decode_time)) decode_time,
round(avg(total_wait_time_micro)) total_wait_time,
round(avg(application_wait_time)) app_wait_time,
round(avg(concurrency_wait_time)) concurrency_wait_time,
round(avg(schedule_time)) schedule_time,
round(avg(return_rows)) return_rows,
round(avg(affected_rows)) affected_rows,
round(avg(row_cache_hit * 2 + bloom_filter_cache_hit * 2 + block_cache_hit +
disk_reads)) logical_reads,
round(avg(row_cache_hit)) row_cache_hit,
round(avg(bloom_filter_cache_hit)) bloom_filter_cache_hit,
round(avg(block_cache_hit)) block_cache_hit,
round(avg(disk_reads)) disk_reads,
round(avg(memstore_read_row_count)) memstore_read_row_count,
round(avg(ssstore_read_row_count)) ssstore_read_row_count,
sum(memstore_read_row_count + ssstore_read_row_count) as total_row_count
from gv$ob_sql_audit
where is_inner_sql = 0
group by svr_ip, sql_id order by total_row_count desc limit 10;
Information about frequent short queries
After you locate the slow SQL queries, you can search for the execution plans of these queries.
Run the following command to obtain the execution plans of the slow SQL queries.
In the following query, specify the
tenant_idbased on the business to be analyzed, and fill in thesvr_ipandsql_idbased on the results of locating the slow SQL queries.SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id, last_active_time, first_load_time, outline_data FROM GV$OB_PLAN_CACHE_PLAN_STAT WHERE TENANT_ID = 1002 AND SQL_ID = '3310A1D1D81D4BA92CEEF42538136DD1' AND SVR_IP = '11.xxx.x.xx' AND SVR_PORT = 35046;If the preceding command succeeds, you will find the execution plan ID (
plan_id) of the query, the time when the execution plan was generated for the first time (first_load_time), and the time when the execution plan was last used (last_active_time). For batch processing, the time when the last active plan was used is very likely the time when the slow query was initiated.outline_datais a hint that describes the execution plan of the query.*************************** 1. row *************************** tenant_id: 1002 svr_ip: 11.xxx.x.xx svr_port: 35046 sql_id: 3310A1D1D81D4BA92CEEF42538136DD1 plan_id: 741 last_active_time: 2022-08-04 11:00:34.466037 first_load_time: 2022-08-04 11:00:34.466037 outline_data: /*+ BEGIN_OUTLINE_DATA FULL(@"SEL$1" "test.t1"@"SEL$1") END_OUTLINE_DATA*/Obtain the plan form of slow SQL.
Execute the following statement to obtain the tree-like structure of the execution plan.
SELECT OPERATOR, NAME, ROWS, COST FROM GV$OB_PLAN_CACHE_PLAN_EXPLAIN WHERE TENANT_ID = 1002 AND SVR_IP = '11.xxx.x.xx' AND SVR_PORT = 35046 AND PLAN_ID = 741 ;Below is the physical plan returned by the preceding query.
+----------------+------+------+------+ | OPERATOR | NAME | ROWS | COST | +----------------+------+------+------+ | PHY_TABLE_SCAN | t1 | 1 | 45 | +----------------+------+------+------+In addition to the physical plan, you can also obtain the logical plan by simply executing the SQL statement with the
EXPLAINstatement.explain select sleep(1000), rand() from t1; *************************** 1. row *************************** Query Plan: =================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ----------------------------------- |0 |TABLE SCAN|t1 |1 |46 | =================================== Outputs & filters: ------------------------------------- 0 - output([sleep(?)], [rand()]), filter(nil), access([t1.__pk_increment]), partitions(p0)