GV$OB_SQL_AUDIT is the most commonly used SQL monitoring view. It records the source, execution status, resource consumption, and waiting events of each SQL request, as well as the SQL text and execution plan. This view is a powerful tool for diagnosing SQL issues.
The view contains many fields. We will introduce them in sections. The main fields are as follows:
TENANT_ID: the ID of the tenant that made the request.SVR_IP: the IP address of the server node that received the request.CLIENT_IP: the IP address of the client that sent the request.REQUEST_TIME: the time when the request arrived.REQUEST_ID: the ID of the request, which identifies a single execution and increments over time. External monitoring systems can use this field as a cursor to pull audit information.IS_INNER_SQL: whether the request is an internal SQL request.IS_EXECUTOR_RPC: whether the request is an RPC request.SQL_ID: the ID of a specific SQL statement. Multiple executions of the same SQL statement have the sameSQL_IDand differentREQUEST_ID.QUERY_SQL: the complete text of the SQL statement. OceanBase Database supports binding an execution plan to an SQL statement by usingSQL_IDandSQL_TEXT.SID: the ID of a session. This field can be used to associate all SQL statements and waiting events on the connection with the session. This field corresponds to theIDfield in theGV$OB_PROCESSLISTview.TX_ID: the ID of a transaction. This field can be used to associate all SQL statements in the transaction with the transaction. If the start and end times of two consecutive SQL statements in a transaction are significantly different, the time spent on the link is large.TRACE_ID: the trace ID of the SQL request. This field can be used to associate the SQL request with other monitoring metrics or logs.IS_HIT_PLAN: whether the plan is cached. SQL optimization is time-consuming. To avoid repeatedly executing SQL optimization, the generated plan is cached. When the plan is cached, the plan is directly obtained from the Plan Cache. If the plan is not cached, the plan is hard-parsed. If the plan is cached, the plan is soft-parsed. Hard parsing affects SQL performance and increases the time spent. If the plan cache hit rate of the tenant is too low, check whether the plan cache space is too small, which causes plans to be frequently evicted.PLAN_ID: the ID of the execution plan. This field can be used to query the detailed information and statistics of the plan.PLAN_HASH: the hash value of the execution plan.PLAN_TYPE: the type of the execution plan. The value can be0,1,2, or3.1,2, and3correspond to local plans, remote plans, and distributed plans, respectively.0indicates that no execution plan exists, for example, for a commit statement.AFFECTED_ROWS: the number of rows affected.RETURN_ROWS: the number of rows returned.RET_CODE: the return code of the execution result.EVENT: the name of the longest waiting event.P1TEXT ~ P3TEXT: the parameters 1 to 3 of the waiting event.P1 ~ P3: the values of the parameters 1 to 3 of the waiting event.LEVEL: the level of the waiting event.WAIT_CLASS_ID: the ID of the category to which the waiting event belongs.WAIT_CLASS: the name of the category to which the waiting event belongs.STATE: the state of the waiting event.WAIT_TIME_MICRO: the time (in microseconds) spent waiting for the waiting event.TOTAL_WAIT_TIME_MICRO: the total time (in microseconds) spent waiting for all waiting events during the execution.TOTAL_WAITS: the total number of waiting events during the execution.ELAPSED_TIME: the total time (in microseconds) spent on the execution, from the request arrival to the execution end. This field consists of multiple sub-stages:NET_TIME: the time from when the RPC is sent to when the request is received.NET_WAIT_TIME: the time from when the request is received to when it enters the queue.QUEUE_TIME: the queue time, which reflects the request backlog of the current tenant.DECODE_TIME: the decode time after the request is dequeued.GET_PLAN_TIME: the time spent generating the execution plan, which reflects the health of the Plan Cache of the current tenant.EXECUTE_TIME: the execution time of the plan.
EXECUTE_TIME: the actual execution time of the current request, which is the sum of the CPU time andTOTAL_WAIT_TIME_MICRO.TOTAL_WAIT_TIME_MICROconsists of several parts, includingAPPLICATION_WAIT_TIME,CONCURRENCY_WAIT_TIME,USER_IO_WAIT_TIME, andSCHEDULE_TIME.EXECUTE_TIMEis the total time spent on the execution, including the CPU calculation time (CPU_TIME) and various waiting times (TOTAL_WAIT_TIME_MICRO).APPLICATION_WAIT_TIME: the total time of all application events.CONCURRENCY_WAIT_TIME: the total time of all concurrency events.USER_IO_WAIT_TIME: the total time of alluser_ioevents.SCHEDULE_TIME: the total time of allscheduleevents.
- Logical reads: During the execution of a request, data is first read from various levels of cache (corresponding to the
ROW_CACHE_HIT,BLOOM_FILTER_CACHE_HIT, andBLOCK_CACHE_HITfields). If none of the caches hit, a physical disk read is performed (corresponding to theDISK_READSfield). By counting the number of cache reads and physical disk reads, you can determine the number of rows scanned during the execution of the request (which is not equal to the actual physical reads, as the request first scans the caches). This helps determine whether the SQL statement needs optimization.ROW_CACHE_HIT: the number of row cache hits.BLOOM_FILTER_CACHE_HIT: the number of bloom filter cache hits.BLOCK_CACHE_HIT: the number of block cache hits.DISK_READS: the number of physical disk reads.
RETRY_CNT: the number of retries.TABLE_SCAN: whether the request contains a full table scan.CONSISTENCY_LEVEL: the consistency level.MEMSTORE_READ_ROW_COUNT: the number of rows read from the MemStore.SSSTORE_READ_ROW_COUNT: the number of rows read from the SSStore.REQUEST_MEMORY_USED: the memory consumed by the request.PLSQL_COMPILE_TIME: Time spent compiling PL code.INSERT_DUPLICATE_ROW_COUNT: the number of duplicate rows in theinsertuporreplace intooperation.
The data in the GV$OB_SQL_AUDIT view is stored in a configurable memory space. Each tenant has an independent cache on each node. When the memory usage or the number of records reaches the eviction threshold, the system automatically evicts the oldest data. Experienced DBAs often disable the SQL audit feature to preserve the scene when troubleshooting SQL issues, preventing the monitoring data from being evicted.
You can control the behavior of the SQL audit feature by using the following parameters:
enable_sql_audit: a cluster-level parameter that controls whether the SQL audit feature is enabled for all tenants. The parameter takes effect dynamically.ob_enable_sql_audit: a tenant-level system variable that controls whether the SQL audit feature is enabled for the current tenant. The parameter takes effect dynamically.ob_sql_audit_percentage: a tenant-level system variable that controls the percentage of tenant memory occupied by the SQL audit feature. The parameter takes effect dynamically. To prevent the SQL audit feature from consuming excessive memory, the system sets its memory limit to 1 GB.
You can query various dimensions of SQL execution information from the GV$OB_SQL_AUDIT view.
Query SQL statements that take more than 100 ms to execute.
obclient> select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10; +------------+----------------------------+--------------+------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ | request_id | usec_to_time(request_time) | ELAPSED_TIME | QUEUE_TIME | EXECUTE_TIME | query_sql | +------------+----------------------------+--------------+------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1538599798 | 2023-03-08 11:00:46.089711 | 335152 | 462 | 329196 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538601580 | 2023-03-08 11:00:47.411316 | 276913 | 1420 | 275345 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538603976 | 2023-03-08 11:00:49.258464 | 154873 | 461 | 154236 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538613501 | 2023-03-08 11:00:56.123111 | 188973 | 688 | 188144 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538712684 | 2023-03-08 11:02:07.504777 | 288516 | 1137 | 287180 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538743161 | 2023-03-08 11:02:29.135127 | 289585 | 26 | 289380 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538749786 | 2023-03-08 11:02:33.890317 | 294356 | 45 | 294180 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538792259 | 2023-03-08 11:03:04.626596 | 192843 | 128 | 192569 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538799117 | 2023-03-08 11:03:09.567622 | 201594 | 55 | 201388 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | | 1538804299 | 2023-03-08 11:03:13.274090 | 235720 | 241 | 235302 | select request_id,usec_to_time(request_time),ELAPSED_TIME,QUEUE_TIME,EXECUTE_TIME,query_sql from v$OB_SQL_AUDIT where ELAPSED_TIME > 100000 limit 10 | +------------+----------------------------+--------------+------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ 10 rows in set (0.28 sec)Query the average queue time of the last 1000 SQL statements.
obclient> select /*+ query_timeout(30000000) */ avg(queue_time) from v$OB_SQL_AUDIT where request_id > (select max(request_id) from v$OB_SQL_AUDIT) - 1000 ; +-----------------+ | avg(queue_time) | +-----------------+ | 350.8740 | +-----------------+ 1 row in set (0.26 sec)Query the SQL statements that consume the most tenant resources, sorted in descending order by
execution_time * execution_count. If the tenant is currently experiencing capacity issues (tenant CPU utilization is at maximum), you can use this statement to determine whether the issue is related to SQL statements or suspicious SQL statements.obclient> select SQL_ID, avg(ELAPSED_TIME), avg(QUEUE_TIME), avg(ROW_CACHE_HIT + BLOOM_FILTER_CACHE_HIT + BLOCK_CACHE_HIT + DISK_READS) avg_logical_read, avg(execute_time) avg_exec_time, count(*) cnt, avg(execute_time - TOTAL_WAIT_TIME_MICRO ) avg_cpu_time, avg( TOTAL_WAIT_TIME_MICRO ) avg_wait_time, WAIT_CLASS, avg(retry_cnt) from v$OB_SQL_AUDIT group by 1 order by avg_exec_time * cnt desc limit 10; +----------------------------------+-------------------+-----------------+------------------+---------------+--------+--------------+---------------+------------+----------------+ | SQL_ID | avg(ELAPSED_TIME) | avg(QUEUE_TIME) | avg_logical_read | avg_exec_time | cnt | avg_cpu_time | avg_wait_time | WAIT_CLASS | avg(retry_cnt) | +----------------------------------+-------------------+-----------------+------------------+---------------+--------+--------------+---------------+------------+----------------+ | 2705182A6EAB699CEC8E59DA80710B64 | 54976.9269 | 43.8605 | 17664.2727 | 54821.5828 | 11759 | 54821.5828 | 0.0000 | OTHER | 0.0000 | | 32AB97A0126F566064F84DDDF4936F82 | 1520.9832 | 380.7903 | 63.7847 | 789.6781 | 63632 | 789.6781 | 0.0000 | OTHER | 0.0000 | | A5F514E873BE9D1F9A339D0DA7481D69 | 44032.5553 | 44.5149 | 8943.7834 | 43878.1405 | 1039 | 43878.1405 | 0.0000 | OTHER | 0.0000 | | 31FD78420DB07C11C8E3154F1658D237 | 7769857.0000 | 35.7500 | 399020.7500 | 7769682.7500 | 4 | 7769682.7500 | 0.0000 | NETWORK | 1.0000 | | C48AEE941D985D8DEB66892228D5E845 | 8528.6227 | 0.0000 | 0.0000 | 8450.4047 | 1601 | 8450.4047 | 0.0000 | OTHER | 0.0000 | | 101B7B79DFA9AE801BEE4F1A234AD294 | 158.2296 | 41.7211 | 0.0000 | 46.0345 | 286758 | 46.0345 | 0.0000 | OTHER | 0.0000 | | 1D0BA376E273B9D622641124D8C59264 | 1774.5924 | 0.0049 | 0.0000 | 1737.4885 | 5081 | 1737.4885 | 0.0000 | OTHER | 0.0000 | | 64CF75576816DB5614F3D5B1F35B1472 | 1801.8767 | 747.0343 | 0.0000 | 827.1674 | 10340 | 827.1674 | 0.0000 | OTHER | 0.0000 | | 23D1C653347BA469396896AD9B20DCA1 | 5564.9419 | 0.0000 | 0.0000 | 5478.2228 | 1257 | 5478.2228 | 0.0000 | OTHER | 0.0000 | | FA4F493FA5CE2DCC64F51CF3754F96C6 | 2478.3956 | 378.7557 | 3.1040 | 1731.1802 | 3357 | 1731.1802 | 0.0000 | OTHER | 0.0000 | +----------------------------------+-------------------+-----------------+------------------+---------------+--------+--------------+---------------+------------+----------------+ 10 rows in set (1.34 sec)Note
- When tenant jitter occurs, it typically manifests as "tenant CPU maxed out" and "all SQL RT (response time) surging." The first step is to determine whether the surge in SQL RT is the primary incident (for example, whether the SQL itself is causing the RT spike, or if other issues are leading to the SQL RT surge).
- The SQL query introduced above is a powerful tool. It aggregates by
SQL_IDand sorts in descending order by resource usage (resource usage can be considered asavg_exec_time * cnt). By observing the top SQL statements, you can check for obvious anomalies and determine whether this is the primary incident.