A customer experienced a performance failure in its database system, with a large number of transactions congested. The direct cause was that the CPU utilization on some OBServer nodes reached 100%, preventing transactions from being processed normally. By analyzing the execution audit information in the GV$OB_SQL_AUDIT view, it was identified that a problematic SQL query had an average execution time of 2 seconds, with its CPU usage accounting for more than 70% of the total CPU resources on the OBServer node. After considering the execution plan (Explain) and the DDL definitions of related tables and views, the performance bottleneck was resolved by reducing the CPU time consumed by the SQL query to 4 ms through a series of DDL optimization measures.
Logic of troubleshooting
The troubleshooting logic in this case is as follows:
Locate the top SQL queries by querying the
GV$OB_SQL_AUDITview.Execute the following SQL statement to identify the SQL queries that consume the most CPU resources in a specific period (30 minutes) on a specific OBServer node (
svr_ip), specifically for SQL queries that are grouped bysql_id, and compare their CPU utilization with the total CPU utilization in the problematic period. The formula for calculating the CPU utilization of the SQL queries isSUM(execute_time)/(30*60*1000*1000)AS cpu_cnt:SELECT sql_id, COUNT(*) AS executions, SUM(execute_time) AS tot_cpu_time, AVG(execute_time) AS avg_cpu_time, SUM(execute_time)/(30*60*1000*1000) AS cpu_cnt, query_sql FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id= 'mysql001' AND svr_ip='xxx.xxx.xxx.xxx' AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) AND is_executor_rpc =0 GROUP BY sql_id HAVING COUNT(*)>1 ORDER BY cpu_cnt DESC LIMIT 10;Result analysis:
- The execution frequency and CPU utilization of the
SELECT ... FROM V_TT01 WHERE (COL001 IN (...))SQL queries are high, reaching about 70% (80 CPU cores/108 CPU cores) on the same OBServer node.
+----------------------------------+------------+--------------+--------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------+ | sql_id | executions | tot_cpu_time | avg_cpu_time | cpu_cnt | query_sql | +----------------------------------+------------+--------------+--------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------+ | 2705182A6EAB699CEC8E59DA80710B64 | 23,425 | 33,818,182,526 | 1,443,679 | 56 | SELECT ... FROM V_TT01 WHERE (COL001 IN (20017476)) | | 32AB97A0126F566064F84DDDF4936F82 | 7,112 | 7,926.295,919 | 1,114,496 | 13 | SELECT ... FROM V_TT01 WHERE (COL001 IN (20035891,20035892)) | | A5F514E873BE9D1F9A339D0DA7481D69 | 4,046 | 6,053,872,739 | 1,496,261 | 10 | SELECT ... FROM V_TT01 WHERE (COL001 IN (20017892,20007654,20025634)) | | 31FD78420DB07C11C8E3154F1658D237 | 186 | 646,926,207 | 3,478,098 | 1 | SELECT ... FROM V_TT01 WHERE (COL001 IN (20027596,20035891)) | | C48AEE941D985D8DEB66892228D5E845 | 28 | 569,004,505 | 20,321,589 | 1 | SELECT ... | | 101B7B79DFA9AE801BEE4F1A234AD294 | 155 | 440,190,675 | 2,839,940 | 1 | SELECT ... | | 1D0BA376E273B9D622641124D8C59264 | 423 | 429,140,148 | 1,014,516 | 1 | SELECT ... | | 64CF75576816DB5614F3D5B1F35B1472 | 113 | 326,338,159 | 2,887,948 | 1 | SELECT ... | | 23D1C653347BA469396896AD9B20DCA1 | 381 | 309.367.170 | 811,987 | 1 | SELECT ... | | FA4F493FA5CE2DCC64F51CF3754F96C6 | 7 | 289,191,515 | 41,313,074 | 0 | SELECT ... | +------------+----------------------------+--------------+------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ 10 rows in set- The execution frequency and CPU utilization of the
Use the EXPLAIN statement to analyze an execution plan and identify performance bottlenecks.
Run the
EXPLAINcommand to obtain the execution plan of the problematic SQL statement:================================================================================= |ID|OPERATOR |NAME |EST. ROWS|COST | --------------------------------------------------------------------------------- |0 |EXCHANGE IN REMOTE | |177 |432961| |1 | EXCHANGE OUT REMOTE | |177 |397632| |2 | SUBPLAN SCAN |V_TT01 |177 |397632| |3 | UNION ALL | |177 |397629| |4 | NESTED-LOOP JOIN CARTESIAN | |0 |151990| |5 | TABLE SCAN |TBL1(IDX_TBL1_COL001)|1 |92 | |6 | MATERIAL | |0 |151899| |7 | NESTED-LOOP ANTI JOIN CARTESIAN| |0 |151899| |8 | TABLE SCAN |TT01(IDX_TT01_COL001)|1 |92 | |9 | MATERIAL | |177 |151781| |10| SUBPLAN SCAN |VIEW1 |177 |151780| |11| TABLE SCAN |TBL3(IDX_TBL3_COL170)|177 |151777| |12| NESTED-LOOP JOIN CARTESIAN | |177 |236327| |13| TABLE GET |TBL2 |1 |46 | |14| TABLE SCAN |TBL3(IDX_TBL3_COL170)|177 |235510| ================================================================================= Outputs & filters: ------------------------------------- 5 - output(......), filter(nil), access(......), partitions(p0), is_index_back=true, range_key([TBL1.COL001(0x7f510ed97860)], [TBL1.COL002(0x7f48eec9baa0)]), range(20017476,MIN ; 20017476,MAX), range_cond([TBL1.COL001(0x7f510ed97860) = 20017476(0x7f510edbd750)]) 8 - output(......), filter(nil), access(......), partitions(p0), is_index_back=true, range_key([TT01.COL001(0x7f510ed97570)], [TT01.COL003(0x7f510ed97b50)]), range(20017476,MIN ; 20017476,MAX), range_cond([TT01.COL001(0x7f510ed97570) = 20017476(0x7f510edbcc50)]) 11 - output(......), filter([20017476 = cast(cast(TBL3.COL170(0x7f3f5b51c090), VARCHAR2(20 BYTE))(0x7f3f5b51d6b0), NUMBER(-1, -85))(0x7f3f5b51ce80)(0x7f3f5b51c380)]), access(......), partitions(p0), is_index_back=false, filter_before_indexback[false], range_key([TBL3.COL170(0x7f3f5b51c090)], [TBL3.COL002(0x7f3f5b528e80)]), range(MIN,MIN ; MAX,MAX)always true 13 - output(......), filter(nil), access(......), partitions(p0), is_index_back=false, range_key([TBL2.COL004(0x7f0cb0ff9f20)]), range[20017476 ; 20017476], range_cond([TBL2.COL004(0x7f0cb0ff9f20) = 20017476(0x7f0cb1001170)]) 14 - output(......), filter([20017476 = cast(cast(TBL3.COL170(0x7f0cb0ffa210), VARCHAR2(20 BYTE))(0x7f0cb1002fa0), NUMBER(-1, -85))(0x7f0cb1002770)(0x7f0cb1001c70)]), access(......), partitions(p0), is_index_back=true, filter_before_indexback[true], range_key([TBL3.COL170(0x7f0cb0ffa210)], [TBL3.COL002(0x7f3f5b582c10)]), range(MIN,MIN ; MAX,MAX)always trueResult analysis:
According to the preceding
EXPALINplan, the total execution cost of the SQL query is estimated to be 432961, with 397629 (92%) of it consumed by Operator 3UNION ALL. The two branches of theUNION ALLoperator both contain theNESTED-LOOP JOINoperator, with the estimated costs of the two branches being 151990 and 236327 respectively. In the first branch (operators 4 to 11) of theUNION ALLoperator, the cost is mainly consumed by Operator 11, which is aTABLE SCANoperation based on theTBL3(IDX_TBL3_COL170)index;In the second branch (operators 12 to 14) of the
UNION ALLoperator, the cost is mainly consumed by Operator 14, which is also aTABLE SCANoperation based on theTBL3(IDX_TBL3_COL170)index. Therefore, the high costs of operators 11 and 14 may be caused by theTBL3index. This identifies the performance bottleneck in this case.Observe the preceding execution plan and analyze the
Outputs & filtersinformation of operators 11 and 14.range(MIN,MIN ; MAX,MAX)always trueindicates that index matching was not performed, whilefilter([20017476 = cast(cast(TBL3.COL170(0x7f0cb0ffa210), VARCHAR2(20 BYTE))(0x7f0cb1002fa0), NUMBER(-1, -85))(0x7f0cb1002770)(0x7f0cb1001c70)])indicates that data type conversion occurred during table access. We can infer that the data types of associated fields may be inconsistent and data type conversion has been performed, resulting in an index matching failure onTBL3.COL170. Make further analysis based on this conclusion and DDL definitions.
11 - output(......), filter([20017476 = cast(cast(TBL3.COL170(0x7f3f5b51c090), VARCHAR2(20 BYTE))(0x7f3f5b51d6b0), NUMBER(-1, -85))(0x7f3f5b51ce80)(0x7f3f5b51c380)]), access(......), partitions(p0), is_index_back=false, filter_before_indexback[false], range_key([TBL3.COL170(0x7f3f5b51c090)], [TBL3.COL002(0x7f3f5b528e80)]), range(MIN,MIN ; MAX,MAX)always true 14 - output(......), filter([20017476 = cast(cast(TBL3.COL170(0x7f0cb0ffa210), VARCHAR2(20 BYTE))(0x7f0cb1002fa0), NUMBER(-1, -85))(0x7f0cb1002770)(0x7f0cb1001c70)]), access(......), partitions(p0), is_index_back=true, filter_before_indexback[true], range_key([TBL3.COL170(0x7f0cb0ffa210)], [TBL3.COL002(0x7f3f5b582c10)]), range(MIN,MIN ; MAX,MAX)always trueKey findings:
- Total cost: The cost of Operator 3
UNION ALLaccounts for 92% (397,629 / 432,961) of the total execution plan cost. - Index failure: The IDX_TBL3_COL170 index on the TBL3 table fails to take effect, leading to a full table scan (TABLE SCAN).
- Type conversion issue: The COL170 field is of the VARCHAR2 data type, which does not match the NUMBER data type of the associated field. This results in a performance bottleneck due to forced type conversion.
Determine the cause of the performance issue by analyzing the DDL definitions of related tables and views.
First, the definition of the
V_TT01view shows thattbl3in the first branch ofUNION ALLexists in theNOT EXISTSsubquery. According to theEXPLAINplan, the relevant subquery is rewritten asNESTED-LOOP ANTI JOIN. In addition, theV_TT01view undergoesTO_NUMBERconversion, wherecol001andcol004are passed to the(IDX_TBL3_COL170)index ofTBL3throughcol170.Next, analyze the
TBL3table definition. TheTBL3(IDX_TBL3_COL170)index is defined on theCOL170field with a VARCHAR2(20) data type. This field is used for associating theV_TT01view with other tables. According to theV_TT01view definition, in the first branch ofUNION ALL, the association corresponding to Operator 11 is"tbl3"."col170" = "tt01"."col001" = 20017476, and the association corresponding to Operator 14 in the second branch is"tbl3"."col170" = "tbl2"."col004" = 20017476. Due to the inconsistent data types of the associated fields,TO_NUMBERconversion occurs, which means that the VARCHAR2 data type of theCOL170field is forcibly converted to NUMBER. This is consistent with thefilterinformation of operators 11 and 14 inOutputs & filters.
Based on the DDL definition and
EXPALINplan, we can see that during the execution of this type of SQL queries, the unreasonable forcibleTO_NUMBERconversion in theV_TT01view causes a decrease in index performance, resulting in a high cost estimate for operators 11 and 14. TheTO_NUMBERconversion must be removed./* The definition of the problematic SQL query. */ SELECT ...... FROM v_tt01 WHERE COL001 IN (20017476); /* The definition of the V_TT01 view. */ CREATE VIEW "V_TT01_OLD" AS (SELECT To_number("LOCAL_CXZX"."tt01"."col001") AS "COL001", ... ... FROM "LOCAL_CXZX"."tt01", "LOCAL_CXZX"."tbl1" WHERE ( "LOCAL_CXZX"."tt01"."col001" = "LOCAL_CXZX"."tbl1"."col001" ) AND NOT EXISTS((SELECT 1 FROM "LOCAL_CXZX"."tbl3" WHERE ( "LOCAL_CXZX"."tt01"."col001" = "LOCAL_CXZX"."tbl3"."col170" )))) UNION ALL (SELECT To_number("LOCAL_CXZX"."tbl2"."col004") AS "COL001", ... ... FROM "LOCAL_CXZX"."tbl2", "LOCAL_CXZX"."tbl3" "TBL3" WHERE ( "LOCAL_CXZX"."tbl2"."col004" = "tbl3"."col170" )) ; /* The definition of the TBL1 table. */ CREATE TABLE "LOCAL_BDZGBGYY"."TBL1" ( "COL001" NUMBER(16) NOT NULL , // 5: To_number("tt01"."col001") = "tbl1"."col001" = 20017476, To_number(NUMBER(16)) = NUMBER(20) ... ... PRIMARY KEY ("COL002") ) ; CREATE INDEX "LOCAL_BDZGBGYY"."IDX_TBL1_COL001" on "LOCAL_BDZGBGYY"."TBL1" ( "COL001" ) GLOBAL ; /* The definition of the TT01 table. */ CREATE TABLE "EINP_BASICINFO"."TT01" ( ... ... "COL001" NUMBER(20) NOT NULL, // 8: To_number("tt01"."col001") = "tbl1"."col001" = 20017476, To_number(NUMBER(16)) = NUMBER(20) ... ... PRIMARY KEY ("COL003") ) ; CREATE INDEX "EINP_BASICINFO"."IDX_TT01_COL001" on "EINP_BASICINFO"."TT01" ( "COL001" ) GLOBAL ; /* The definition of the TBL3 table.*/ CREATE TABLE "LOCAL_BDZGBGYY"."TBL3" ( "COL170" VARCHAR2(20) NOT NULL, // 11: "tbl3"."col170" = "tt01"."col001" = 20017476. VARCHAR2(20) is changed to NUMBER (20). ... ... // 14: "zb10"."col170" = "tbl2"."col004" = 20017476. VARCHAR2(20) is changed to NUMBER (20). PRIMARY KEY ("COL002") ) ; CREATE INDEX "LOCAL_BDZGBGYY"."IDX_TBL3_COL170" on "LOCAL_BDZGBGYY"."TBL3" ( "COL170" ) GLOBAL ; /* The definition of the TBL2 table. */ CREATE TABLE "SICP4_BASICINFOCENTER_EINP"."TBL2" ( "COL004" NUMBER(20) NOT NULL, // 13: COL004 = 20017476 ... .... PRIMARY KEY ("COL004") ) ;Key table field type comparison:
Table/FieldData TypeAssociated ConditionDescriptionTT01.COL001 NUMBER(20) tt01.COL001 = tbl3.COL170 In the view, tt01.COL001 is converted to a number using To_Number, which results in a type mismatch. TBL3.COL170 VARCHAR2(20) tbl3.COL170 = tt01.COL001 Cannot directly match the String type with the NUMBER field. TBL2.COL004 NUMBER(20) tbl2.COL004 = tbl3.COL170 Type mismatch between TBL2.COL004 and TBL3.COL170
Logic of optimization
The DDL optimization logic for tables and views in this case is as follows:
Remove unnecessary type conversion.
- Bug: The
TO_NUMBERconversion in theV_TT01view causes a type mismatch between the associated fields, leading to the failure of the index. - Optimization: Use native field types directly and remove
TO_NUMBER:
-- Optimized view definition (removed type conversion) CREATE VIEW "V_TT01_NEW" AS ( SELECT "tt01"."col001" AS "COL001", ... FROM "tt01", "tbl1" WHERE "tt01"."col001" = "tbl1"."col001" AND NOT EXISTS ( SELECT 1 FROM "tbl3" WHERE "tt01"."col001" = "tbl3"."col170" ) UNION ALL ( SELECT "tbl2"."col004" AS "COL001", ... FROM "tbl2", "tbl3" WHERE "tbl2"."col004" = "tbl3"."col170" ) );- Effect: The SQL execution time is shortened from 2s to 150 ms, and the optimizer selects a better JOIN order.
- Bug: The
Standardize the data types of fields.
- Problem:
TBL3.COL170is defined asVARCHAR2(20), while the associated fields COL001 and COL004 are defined asNUMBER. This results in type conversion and index failure. - Optimization:
-- Change the type of the TBL3 field to NUMBER(20). ALTER TABLE "TBL3" MODIFY ("COL170" NUMBER(20)); -- Rebuild the index. CREATE INDEX "IDX_TBL3_COL170" ON "TBL3" ("COL170");- Effect: The data types of the associated fields in the table and view are defined as
NUMBER(20). This avoids type conversion, allowsTBL3(IDX_TBL3_COL170)to be used for index matching, and enables the index to take effect after type unification, further reducing the SQL execution time to 20 ms.
- Problem:
Rewrite partitions and table groups.
- Problem: The CPU load on a single node is excessively high due to the hotspot view
V_TT01. - Optimization suggestion:
- Hash partitioning: The associated tables are partitioned by using the
COL001/COL004/COL170columns. - Table group binding: Add associated tables to the same table group to avoid cross-node associated queries.
- Hash partitioning: The associated tables are partitioned by using the
-- Example: Hash partition TBL3. ALTER TABLE "TBL3" PARTITION BY HASH("COL170") PARTITIONS 8; -- Add a table to a table group. ALTER TABLE "TBL3" SET TABLEGROUP = 'group1';- Effect: The query pressure is distributed to multiple nodes, and the final SQL execution time is reduced to 4 ms.
- Problem: The CPU load on a single node is excessively high due to the hotspot view
Optimized execution plan:
====================================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------------------------------------
|0 |PX COORDINATOR | |3 |477 |
|1 | EXCHANGE OUT DISTR |:EX10000 |3 |405 |
|2 | SUBPLAN SCAN |V_TT01 |3 |405 |
|3 | PX PARTITION ITERATOR| |3 |405 |
|4 | UNION ALL | |3 |405 |
|5 | NESTED-LOOP JOIN | |2 |273 |
|6 | MERGE ANTI JOIN | |2 |183 |
|7 | TABLE SCAN |TT01(INDX_TT01_COL001) |2 |137 |
|8 | SUBPLAN SCAN |VIEW1 |1 |46 |
|9 | TABLE SCAN |TBL3(INDEX_TBL3_COL170)|1 |46 |
|10| TABLE SCAN |TBL1(INDEX_TBL1_COL001)|1 |44 |
|11| NESTED-LOOP JOIN | |1 |132 |
|12| TABLE SCAN |TBL3(INDEX_TBL3_COL170)|1 |92 |
|13| TABLE GET |TBL2 |1 |40 |
====================================================================
Key improvements:
PX_PARTITION_ITERATORindicates thatUNION ALLis executed only within partitions.- The index
TBL3.INDEX_TBL3_COL170is effectively matched, avoiding full table scan. The cost of indexes onTBL3is extremely low, significantly improving performance.
References
For more information about the views mentioned in this topic, see the following view documents:
