At present, the execution of SQL statements is not monitored at the operator level in analytical processing (AP) scenarios. For example, the number of tasks executed in parallel, whether data skew occurs, whether hash conflicts are serious, and a specific operator that is stuck when the execution hangs are not monitored at the operator level. Diagnostics depends on logs. To solve these problems, OceanBase Database provides the Oracle-compatible GV$SQL_PLAN_MONITOR view. You can query this view for execution details of each operator.
Overview
The GV$SQL_PLAN_MONITOR view supports operator-level monitoring on SQL statements. Each row corresponds to one operator instance and records the monitoring data generated during running of the operator instance. The monitoring data includes general monitoring data and operator-specific monitoring data. The general monitoring data includes the number of rows of data processed by the operator, time when the operator was opened, time when the operator was closed, and the memory used. The operator-specific monitoring data includes a hash conflict rate, the number of blocks split by the GI operator, and the number of rows scanned by the TABLE SCAN operator.
The following table describes columns in the view.
| Column | Type | Description |
|---|---|---|
| CON_ID | NUMBER | The ID of the tenant. |
| SVR_IP | VARCHAR2(32) | The IP address of the OBServer node where the operator is executed. |
| SVR_PORT | NUMBER | The port number of the OBServer node where the operator is executed. |
| TRACE_ID | VARCHAR2(128) | The trace ID. |
| FIRST_REFRESH_TIME | DATE | The time when the monitoring of the operator started. |
| LAST_REFRESH_TIME | DATE | The time when the monitoring of the operator ended. |
| FIRST_CHANGE_TIME | DATE | The time when the operator output the first row of data. |
| LAST_CHANGE_TIME | DATE | The time when the operator output the last row of data. |
| PROCESS_NAME | VARCHAR2(6) | The ID of the thread that executed the operator. |
| SQL_ID | VARCHAR2(13) | The SQL ID. |
| PLAN_PARENT_ID | NUMBER | The ID of the parent operator of the plan. |
| PLAN_LINE_ID | NUMBER | The line ID of the operator. |
| PLAN_OPERATION | VARCHAR2(30) | The name of the operator. |
| PLAN_DEPTH | NUMBER | The depth of the operator in the plan tree. |
| STARTS | NUMBER | The number of times the operator has been rescanned. |
| OUTPUT_ROWS | NUMBER | The total number of rows returned from all the executions of the operator. |
| WORKAREA_MEM | NUMBER | The size of the workarea that the operator occupies in the memory. |
| WORKAREA_MAX_MEM | NUMBER | The maximum size of the workarea that the operator can occupy. |
| WORKAREA_TEMPSEG | NUMBER | The dump space that the operator occupies on the disk. |
| WORKAREA_MAX_TEMPSEG | NUMBER | The maximum size of the dump space that the operator can occupy on the disk. |
| OTHERSTAT_1_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_1_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_2_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_2_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_3_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_3_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_4_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_4_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_5_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_5_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_6_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_6_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_7_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_7_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_8_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_8_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_9_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_9_VALUE | NUMBER | Reserved for future use. |
| OTHERSTAT_10_ID | NUMBER | Reserved for future use. |
| OTHERSTAT_10_VALUE | NUMBER | Reserved for future use. |
| OUTPUT_BATCHES | NUMBER | The number of times that the operator calls the get_next_batch operation in vectorized mode. This field is specific to OceanBase Database. |
| SKIPPED_ROWS_COUNT | NUMBER | The total number of rows skipped during operator evaluation in vectorized mode. It is the total number of rows filtered out. This field is specific to OceanBase Database. |
| DB_TIME | NUMBER | The time consumed by execution of the current operator, in microseconds. It includes the CPU time and the necessary I/O wait time, for example, the time spent in reading/writing data from/to the disk and the time spent in reading/writing network packets. Note that the value does not include the time consumed in executing child operators. |
| USER_IO_WAIT_TIME | NUMBER | The necessary I/O wait time of the current operator, in microseconds. It includes the disk I/O time and network latency. Note that the value does not include the time consumed in executing child operators. |
The meaning and value of the OTHERSTAT_N_VALUE field can vary with operators.
You can query the V$SQL_MONITOR_STATNAME view based on the value of the OTHERSTAT_N_ID field.
Examples
You can analyze performance issues based on the GV$SQL_PLAN_MONITOR view.
Non-optimal index
obclient [SYS]> select output_rows, plan_operation, OTHERSTAT_3_VALUE scans from gv$sql_plan_monitor where trace_id = 'xxx' and plan_operation like '%TABLE_SCAN';
+-------------+--------------------+--------+
| OUTPUT_ROWS | PLAN_OPERATION | SCANS |
+-------------+--------------------+--------+
| 5 | PHY_VEC_TABLE_SCAN | 500072 |
You can execute the preceding SQL statement to query the number of rows scanned by the TABLE SCAN operator and the number of rows returned. When the two values differ greatly, as shown in the preceding example where 500,000 rows are scanned but only five rows are returned, the table has no proper indexes.
Non-optimal JOIN method
If the NESTED-LOOP JOIN operator is used in a scenario where the HASH JOIN is more suitable, the table on the right will be rescanned many times when the table on the left contains a large number of rows, which leads to poor performance. You can execute the following SQL statement to query the maximum number of rescans performed by the operator. If the value is large, the JOIN method is probably inappropriate.
select max(starts) rescans from gv$sql_plan_monitor where trace_id = 'xxx';
General analysis method
In most cases, you can use the following general method to identify the time-consuming part of a plan.
Execute the following SQL statement to query the
GV$SQL_PLAN_MONITORview for the corresponding records. The query result contains the time when each operator was opened, time when each operator was closed, time when each operator returned the first row, and time when each operator returned the last row.select plan_line_id, concat(lpad(' ', plan_depth, ' '), plan_operation) op, sum(output_rows) rowss, sum(STARTS) rescan, min(first_refresh_time) open_time, max(last_refresh_time) close_time, min(first_change_time) first_row_time, max(last_change_time) last_row_eof_time, count(1) threads from gv$sql_plan_monitor where trace_id = 'xxx' group by plan_line_id, plan_operation, plan_depth order by 1;Observe the query result from top to bottom. The
LAST_ROW_EOF_TIMEvalue of operator 0 at the top level must be approximately the same as itsCLOSE_TIMEvalue. Assume that operator 0 is aHASH JOINoperator. During execution, it first collects all data from the left child operator to build a hash table, and then collects data from the right child operator to probe the hash table row by row. In this case, check theLAST_ROW_EOF_TIMEvalue of operator 1, which is the left child operator of theHASH JOINoperator.- If the
LAST_ROW_EOF_TIMEvalue of operator 1 is large, the left branch of theHASH JOINoperator is executed most of the time. In this case, continue to observe subplans of operator 1. - If the
LAST_ROW_EOF_TIMEvalue of operator 1 is small, the right branch of theHASH JOINoperator is executed and the hash table is probed most of the time. In this case, continue to observe subplans of the right child operator of operator 0.
- If the
The following plan is used as an example for analysis.
First, observe the operator at the top level, which is a MERGE JOIN operator. During execution, it fetches one row from the left child operator and then one row from the right child operator. The following query result shows that the operator was opened at the 25.792777 second, the left child operator of the MERGE JOIN operator returned the first row at 16:29:25.793831, and the right child operator of the MERGE JOIN operator returned the first row at 16:29:27.572661. The right branch of the MERGE JOIN operator takes more than one second to return the first row of data.
Then observe the right child operator of the MERGE JOIN operator, which is a SORT operator. During execution, the SORT operator receives all data from the lower layer, sorts the data, and then sends the sorted data to the upper layer. The ROWSS field indicates that the SORT operator collected 1.6 million rows of data. It will take a lot of time to sort the data. You can create an index on the sorting key to cancel sorting and improve performance.
| ========================================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------------- |
| |0 |MERGE JOIN | |1 |5 | |
| |1 |├─SORT | |1 |3 | |
| |2 |│ └─COLUMN TABLE FULL SCAN|T1 |1 |3 | |
| |3 |└─SORT | |1 |3 | |
| |4 | └─TABLE FULL SCAN |T0 |1 |3 | |
| ========================================================== |
select plan_line_id, concat(lpad(' ', plan_depth, ' '), plan_operation) op, sum(output_rows) rowss, sum(STARTS) rescan, min(first_refresh_time) open_time, max(last_refresh_time) close_time, min(first_change_time) first_row_time, max(last_change_time) last_row_eof_time, count(1) threads from gv$sql_plan_monitor where trace_id = 'YC3500BA2DAC4-0006198CC947196A-0-0' group by plan_line_id, plan_operation, plan_depth order by 1;
+--------------+-----------------------+---------+--------+----------------------------+----------------------------+----------------------------+----------------------------+---------+
| PLAN_LINE_ID | OP | ROWSS | RESCAN | OPEN_TIME | CLOSE_TIME | FIRST_ROW_TIME | LAST_ROW_EOF_TIME | THREADS |
+--------------+-----------------------+---------+--------+----------------------------+----------------------------+----------------------------+----------------------------+---------+
| 0 | PHY_MERGE_JOIN | 0 | 0 | 2024-05-29 16:29:25.792777 | 2024-05-29 16:29:27.664014 | NULL | 2024-05-29 16:29:27.664014 | 1 |
| 1 | PHY_SORT | 2 | 0 | 2024-05-29 16:29:25.792777 | 2024-05-29 16:29:27.664014 | 2024-05-29 16:29:25.793831 | NULL | 1 |
| 2 | PHY_VEC_TABLE_SCAN | 2 | 0 | 2024-05-29 16:29:25.792777 | 2024-05-29 16:29:27.664014 | 2024-05-29 16:29:25.793831 | 2024-05-29 16:29:25.793831 | 1 |
| 3 | PHY_SORT | 1599984 | 0 | 2024-05-29 16:29:25.792777 | 2024-05-29 16:29:27.664014 | 2024-05-29 16:29:27.572661 | 2024-05-29 16:29:27.664014 | 1 |
| 4 | PHY_VEC_TABLE_SCAN | 1599984 | 0 | 2024-05-29 16:29:25.792777 | 2024-05-29 16:29:27.664014 | 2024-05-29 16:29:25.793831 | 2024-05-29 16:29:26.642023 | 1 |
+--------------+-----------------------+---------+--------+----------------------------+----------------------------+----------------------------+----------------------------+---------+
To analyze performance issues based on the GV$SQL_PLAN_MONITOR view , you must familiarize yourself with execution modes of different operators. Execution modes of some common operators are described below.
Streaming mode: The operator sends data to the upper layer while collecting data from the lower layer.
limit, merge group by, merge distinct, subplan scan
Blocking mode: The operator sends data to the upper layer after collecting all data from the lower layer.
sort, hash group by, hash distinct, material
Other modes.
Operator Execution mode MERGE JOIN,UNION,INTERSECT, orEXCEPTThe operator simultaneously collects data from the two child operators by starting from the left child operator. NESTED-LOOP JOINorSUBPLAN FILTERThe operator collects data from the first child operator row by row and then rescans other child operators to collect data. HASH JOIN,UNION,INTERSECT, orEXCEPTThe operator collects all data from the left child operator and then collects data from the right child operator.
References
- For more information about the
GV$SQL_PLAN_MONITORview, see GV$SQL_PLAN_MONITOR (MySQL mode) and GV$SQL_PLAN_MONITOR (Oracle mode).