Best practices for handling slow queries-OceanBase

Slow queries can compromise system performance and user experience. To optimize a slow query effectively, you need to carefully analyze the execution plan, evaluate index usage, rewrite the SQL statement if necessary, or even adjust the application logic.

This article provides best practices for handling slow queries.

Applicable versions

This topic applies to OceanBase Database V3.x and V4.x.

Key terms

Slow query: an SQL query that takes a long time to execute.
Execution plan: an execution path generated by the database for an SQL statement.
Index: a type of database object used to accelerate data retrieval.
Partitioning: a technique that divides a database table into multiple partitions to enhance query efficiency.
Parallel execution: a process that runs multiple subtasks simultaneously across multiple CPU cores to improve query performance.
Sharding: a method of distributing table data across multiple database instances to boost performance.
Join order optimization: a technique that adjusts the table join order in SQL queries to improve performance.

Handle slow queries in OceanBase Database

OceanBase Database is a distributed hybrid transaction and analytical processing (HTAP) database that focuses on optimizing large queries. This section outlines several techniques for effectively handling slow queries.

Method 1: Use parallel execution

OceanBase Database supports parallel execution for regular queries, DDL operations, and DML operations. By setting an appropriate degree of parallelism (DOP), you can significantly improve SQL performance with minimal CPU overhead.

You can either manually configure a DOP or enable the auto DOP feature:

Manually configure a DOP
- You can use the /*+ parallel(degree) */ hint to set a unified DOP for an SQL statement.
- You can also set a table-level DOP during table creation.
```
create table big_table(c1 int) parallel = 32;
```
Enable the auto DOP feature

In practical business scenarios, determining the optimal level of parallel resources depends on the specific requirements and execution characteristics of your queries. Decisions regarding whether to enable parallel execution and what degree of parallelism to use should be made based on empirical experience, taking into account both system performance and business needs.

Manually specifying the DOP can introduce unnecessary parallel execution overhead for queries that do not benefit from parallelism or do not require a high DOP, potentially resulting in degraded performance. Alternatively, using hints to set the DOP for individual queries requires case-by-case consideration, which becomes unmanageable in environments with a large volume of business queries.

To address the limitations and inconvenience of manual DOP configuration, the query optimizer provides an Auto DOP feature. When generating the execution plan, Auto DOP evaluates the estimated execution time of each query and automatically determines whether to enable parallel execution, as well as the optimal DOP. This approach helps prevent performance issues that may arise from inappropriate manual DOP settings.

Auto DOP can be enabled via the HINT /*+ parallel(auto) */ or by configuring the relevant system variables. Once enabled, the database will automatically calculate an appropriate DOP based on tenant configuration and table data volume, eliminating the need for manual tuning and avoiding the pitfalls of fixed DOP values.

You can further fine-tune parallel execution by adjusting the parameters parallel_degree_policy and parallel_servers_target according to your workload requirements:
- parallel_degree_policy: Controls whether Auto DOP is enabled.
- parallel_servers_target: Specifies the maximum number of parallel execution threads that a tenant can request per node.
Additionally, you can enable and configure automatic parallelism using the following SQL commands:
```
-- Enable parallel execution
SET GLOBAL parallel_degree_policy = AUTO;

-- Set the maximum scan time of the base table, in ms. The default value is 1000 ms. In this example, the value is set to 100 ms, which means that parallel execution is enabled when the scan time of the base table exceeds 100 ms.
SET GLOBAL parallel_min_scan_time_threshold = 100;
```

Method 2: Leverage OceanBase Database's natively distributed partitioning feature

As a natively distributed database, OceanBase treats each partition as an independent unit for storage, high availability, and transactions. Different partitions of a table can be distributed across multiple servers, leveraging the computational power of multiple machines to accelerate large table queries. At the same time, OceanBase’s native distributed architecture allows applications to interact with it just like a single-node database, minimizing the cost of business application modifications.

Partitioning divides a large table into multiple smaller, independently managed segments (partitions). While MySQL implements partitioning at both the physical file and query logic levels, its single-node architecture means that all partitions ultimately reside on the same machine. As a result, although SQL queries with partition predicates can scan only the relevant partitions instead of the entire table, the query resources are still limited to a single MySQL node.

OceanBase’s underlying distributed architecture allows the leaders of different partitions to be located on different replicas and supports automatic load balancing of some partitions to new nodes. This distributed capability enables OceanBase to spread the load of different partitions across multiple nodes, fully utilizing multi-node performance, eliminating the need for complex sharding, and significantly improving the performance of large queries.

The following example illustrates how partitioning can enhance the performance of large queries.

Example

Business issue: In an e-commerce system, the coupon table (with 500 million rows) becomes a performance bottleneck when using MySQL, as query performance by user_id degrades and approaches the limits of single-node capacity.
Solution: By leveraging OceanBase’s partitioning and distributed capabilities, query performance can be optimized and elastic scalability supported. The coupon table is partitioned using the user_id column with hash partitioning into 16 partitions, so each partition contains roughly 30 million rows on average, greatly improving coupon query performance.

Example steps:

Create a partitioned table:
- Partition key selection: user_id is the core field for query conditions; using hash partitioning ensures even data distribution.
- Number of partitions: 16 partitions keep each partition’s data volume around 30 million rows (500 million ÷ 16 ≈ 31.25 million), reducing the scan range for individual queries.
```
-- Create a coupon table partitioned by user_id using hash, split into 16 partitions
CREATE TABLE coupon (
  coupon_id BIGINT,
  user_id BIGINT,
  coupon_code VARCHAR(50),
  expire_time DATETIME,
  PRIMARY KEY (coupon_id, user_id)      -- Primary key includes the partition key user_id
)
PARTITION BY HASH(user_id)
PARTITIONS 16;
```
Query optimization example:
- Partition pruning: The system automatically locates the relevant partition based on the condition user_id=123 (i.e., the partition where user_id % 16 = n), scanning only that partition and avoiding a full table scan.
- Distributed query execution: The query is routed to the leader node of the relevant partition, which may reside on a different physical node, thereby reducing the load on any single node.
```
-- Query coupons for user_id=123 (with partition key condition)
SELECT * FROM coupon WHERE user_id = 123;
```

In addition, for common large-scale promotions and events in retail systems, OceanBase’s distributed scalability allows you to add nodes for automatic load balancing as needed.

Method 3: Bind an execution plan

For SQL statements with poor filtering efficiency, binding an execution plan can temporarily enhance performance. However, for SQL statements with persistent performance issues, it is advisable to optimize the query logic to ensure sustainable and long-term performance improvements.

Bind an execution plan

For an SQL query that is not well-optimized, you can bind an optimal execution plan to the query. This ensures that the database directly uses the specified execution plan, reducing optimization overhead and enhancing query performance.

However, binding an execution plan does not always work. Changes in data distribution or table schema can make the bound execution plan inefficient or even detrimental to query performance. Therefore, it is advisable to regularly review the bound execution plans and update or unbind them as necessary to maintain optimal performance.

Before business launch: You can add hints to the SQL statement and enable the optimizer to generate execution plans based on the specified hints.
Business launched: If the execution plan selected by the optimizer has poor performance, you can update the bound execution plan without the need to modify the SQL statement. You can add a set of hints to the SQL statement by using DDL operations so that the optimizer can generate better plans. This set of hints is called an outline.
- You can use the following syntax to create an outline based on SQL_TEXT:
```
CREATE [OR REPLACE] OUTLINE <outline_name> ON <stmt> [ TO <target_stmt> ];
```
- You can use the following syntax to create an outline based on SQL_ID:
```
CREATE OUTLINE outline_name ON sql_id USING HINT hint_text;
```

Method 4: Transparent SQL logic optimization in OceanBase

OceanBase implements a range of internal optimization mechanisms that significantly improve query performance, all while remaining completely transparent to end users.

Multi-table join: OceanBase Database implements a full set of join enumeration algorithms to flexibly adjust the join order based on the cost. It can adjust the join order for inner joins, outer joins, anti-joins, and semi-joins. It also supports the conversion of join types.
Subquery: The query rewrite module of OceanBase Database provides various subquery optimization strategies to quickly convert subqueries with a small nesting depth into joins and use different join algorithms for optimization. This remarkably improves the execution efficiency of subqueries, thereby optimizing the overall query performance.
Large table aggregation: Large table aggregation is a typical performance optimization scenario. OceanBase Database can split the data in a large table into multiple groups for aggregation. Pre-aggregation in combination with parallel execution can drastically increase the execution efficiency and generally improve the performance by several times.

Method 5: Use columnar storage to accelerate large queries and optimize real-time analysis

In business scenarios especially complex query scenarios involving massive amounts of data, ad hoc queries are a typical challenge to the performance. Assume that the background operation personnel of an e-commerce company filter data in the data platform by criteria such as the customer name, order placement time, or commodity name, to generate dynamic SQL queries and deliver the queries to the database. Conventional indexes cannot be efficiently used for these queries because their filter conditions are uncertain. As a result, a full-table scan is required.

In conventional databases such as MySQL, a full-table scan is acceptable when the data volume is small. However, as the data volume increases, the execution of such queries that require a full-table scan usually takes tens of seconds or even several minutes. This significantly compromises the response time and user experience. Generally, to address this issue, the extract, transform, and load (ETL) mechanism is used to synchronize data to a real-time data warehouse to accommodate complex queries.

OceanBase Database provides an alternative solution: columnar storage. By using columnar storage to accelerate queries, OceanBase Database can directly process large-scale analytical queries efficiently, simplifying the architecture and lowering costs. The characteristics and benefits of columnar storage are described in the following sections.

Characteristics of columnar storage

Data stored by column: Unlike conventional row-based storage, columnar storage physically arranges data of the same column together.
Data read by column: A query scans only involved column data, instead of scanning the entire table. This significantly reduces I/O and memory consumption.
High compression ratio: Generally, columnar storage efficiently compresses data of the same type, further lowering the storage and transmission overheads.

Benefits of columnar storage

Improved query efficiency: In an analytical scenario, a query scans only involved columns without the need to load the entire row. For example, an ad hoc query needs to read only part of the columns. With columnar storage, an ad hoc query can quickly filter and aggregate data, greatly shortening the response time.
Simplified architecture: OceanBase Database can process both transaction processing (TP) and analytical processing (AP) loads without the need for the ETL mechanism or an external data warehouse. Unlike the conventional architecture that consists of a database and a real-time data warehouse, OceanBase Database consolidates all features, simplifying system deployment and O&M.
Reduced costs: By eliminating intermediate phases such as ETL and external storage, OceanBase Database reduces the hardware and software costs. Generally, the overall costs can be reduced by about 30%.

Enable columnar storage

In OceanBase Database, you can perform the following steps to enable columnar storage:

Create a columnstore table: Specify columnar storage as the storage format during table creation.

CREATE TABLE orders (
    order_id BIGINT,
    customer_name VARCHAR(100),
    product_name VARCHAR(100),
    order_time TIMESTAMP,
    amount DECIMAL(10,2)
) WITH COLUMN GROUP (each column);

Query optimization: The optimizer of OceanBase Database automatically selects an appropriate storage and execution method, gaining performance benefits without adjustment.
Monitoring and adjustment: Use the monitoring tools of OceanBase Database to analyze the query performance and adjust the table design or storage strategy as needed.

With columnar storage, OceanBase Database can efficiently filter and aggregate data and process complex ad hoc queries without the need for an external data warehouse. This remarkably simplifies the data architecture and improves the query performance. Columnar storage applies to big data queries that require quick responses and provides an economic and efficient solution for enterprises.

Summary

OceanBase Database can effectively resolve the issue of slow queries in MySQL databases based on the parallel execution and partitioning features, as well as technologies such as join order optimization for multi-table joins, subquery optimization, large table aggregation optimization, and resource isolation for transaction processing (TP) and analytical processing (AP) tasks.

References

For more information about the technical architecture for performance tuning in OceanBase Database, see Overview.
For more information about how to perform performance tuning in OceanBase database, see Quick start.