Best practices for resource throttling-OceanBase

During database operation, abnormal system resource usage can significantly reduce queries per second (QPS), and in severe cases, QPS can drop to zero. This issue may arise from various factors, such as unexpected SQL statements consuming excessive resources, sudden surges in business traffic, or changes in workload patterns caused by a large number of requests. These scenarios can lead to the rapid consumption—and even exhaustion—of critical resources like CPU, memory, and I/O. To address this challenge, OceanBase Database offers a built-in throttling capability. By adjusting resource allocation strategies for different scenarios, it effectively mitigates service performance degradation caused by resource contention. This not only enhances system stability and response speed but also ensures a more reliable service experience under high-load conditions.

Scenario 1: Handling a few SQL requests with performance exceptions in an OLTP scenario

Scenario description

In online transaction processing (OLTP) scenarios, small transaction requests with high concurrency are common. However, abnormal SQL requests can occasionally occur due to suboptimal execution plans or the need to access large amounts of data. These abnormal requests can monopolize cluster resources for extended periods, blocking normal transaction requests. In such cases, an emergency plan is needed to manage or release the resources consumed by these abnormal SQL requests.

Solutions

OceanBase Database provides two solutions to address this issue:

Solution 1: Configure a cluster-level threshold to identify large queries, limit the CPU usage of these queries, and prioritize CPU resources for smaller requests. This prevents burst abnormal SQL requests from blocking normal transactions.

The solution details are as follows:

Throttling strategy

Use the following parameters to configure the identification threshold for large queries and the maximum percentage of CPU worker threads reserved for them:
- large_query_threshold: the execution time threshold for queries. Any query that exceeds this threshold is identified as a large query. The default value is 5s. This is a cluster-level parameter.
  
  You need to specify this parameter in the sys tenant. Here is an example:
```
obclient> ALTER SYSTEM SET large_query_threshold = '5s';
```
- large_query_worker_percentage: the percentage of worker threads reserved for handling large queries. The default value is 30. This is a cluster-level parameter.
  
  You need to specify this parameter in the sys tenant. Here is an example:
```
obclient> ALTER SYSTEM SET large_query_worker_percentage = 30;
```
Large query processing strategy

When a thread executing an SQL request is identified as a large query, it is marked as a large query thread. The system allows only a certain percentage of threads (30% by default) to continue processing large queries, while the remaining large query threads are temporarily suspended. The total number of active threads for a tenant is fixed and calculated using the formula: cpu_count × cpu_quota_concurrency. When a large query thread is suspended, the system assigns a new thread to handle incoming requests from the queue, releasing the CPU resources previously occupied by the suspended thread for smaller queries. After a period of time, once the smaller queries have been processed, the large query thread resumes execution. This approach ensures that smaller queries are processed efficiently without terminating the large query thread, thereby maintaining a balance between system responsiveness for small queries and the completion of large queries. As a result, overall user experience and system performance are significantly improved.
Early large query identification

If a cluster contains many large queries that are only identified during execution, the tenant worker threads in the cluster may become fully occupied, leaving no resources available for smaller queries. To address this, OceanBase Database introduces a feature for pre-identifying large queries during the SQL compilation phase. Before an SQL request is executed, OceanBase retrieves its execution plan from the plan cache and determines whether the request qualifies as a large query based on whether its average execution time exceeds the specified threshold. If an SQL request is pre-identified as a large query, it is moved to a dedicated large query queue for retry, and the tenant worker thread assigned to it is released. This ensures that the system can continue processing subsequent requests without being blocked by large queries, thereby maintaining overall system responsiveness.
Throttling records

You can query the gv$ob_plan_cache_plan_stat view for the throttling records of large queries. Pay attention to the LARGE_QUERYS column which indicates the number of times that the current request was identified as a large query, and the DELAYED_LARGE_QUERYS column which indicates the number of times that the current request was dropped to the large query queue after being identified as a large query.

Solution 2: Specify an SQL-level concurrency threshold (outline) for complex SQL requests that are temporarily initiated or consume many resources.

The details of the solution are described as follows:

Throttling strategy

Bind an outline and specify a /*+max_concurrent(N)*/ hint. In the hint, N specifies the number of requests of a specific query type that can be concurrently executed, which is actually the number of requests that can be concurrently executed by an execution plan.

When the number of requests of the throttled query type reaches the specified threshold, the system returns an SQL reach max concurrent num error for new requests.
Outline binding modes

Three outline binding modes are supported: SQL text-based, SQL ID-based, and fuzzy.
Notice
- When an SQL statement matches multiple throttling rules, the one with the smallest concurrency applies.
- All outline-related features take effect in the current tenant. You cannot perform outline-related operations on other tenants from the `sys` tenant.
- SQL text-based outline binding
  
  The syntax is as follows:
```
CREATE [OR REPLACE] OUTLINE outline_name ON stmt [ TO target_stmt ];
```
  The parameters are described as follows:
  - outline_name: the name of the outline.
  - stmt: the SQL statement to which the outline is to be bound. It is generally a DML statement with hints and original parameters.
  - TO target_stmt: If TO target_stmt is not specified, when the parameterized SQL statement accepted by the database is the same as the parameterized text of stmt without a hint, the database binds the SQL statement to the hint in stmt to generate an execution plan. If you want to bind plans to statements that have hints, use TO target_stmt to specify the original SQL statement.
  Create an outline in the SQL text format and specify a hint to enable throttling. Here is an example:
```
obclient> CREATE OUTLINE ol_1 ON SELECT /*+max_concurrent(0)*/ * FROM t1 WHERE c1 = 1 AND c2 = ?;
```
  In this example, the following SQL statements actually take effect:
```
SELECT * FROM t1 WHERE c1 = 1 AND c2 = 1;
```
  and
```
SELECT * FROM t1 WHERE c1 = 1 AND c2 = 2;
```
  For more information about the CREATE OUTLINE statement, see CREATE OUTLINE.
- SQL ID-based outline binding
  
  The syntax is as follows:
```
CREATE [OR REPLACE] OUTLINE outline_name ON sql_id USING HINT hint;
```
  The parameters are described as follows:
  - outline_name: the name of the outline.
  - sql_id: the ID of the SQL statement to which the outline is to be bound. You can query the V$OB_PLAN_CACHE_PLAN_STAT or GV$OB_SQL_AUDIT view for the SQL ID, or use MD5 to generate an SQL ID based on the parameterized original SQL statement. For more information about how to obtain the SQL ID, see Plan binding.
  - hint: the hint of the SQL statement to which the outline is to be bound. A hint is specified in the format of /*+ xxx */.
  Assume that the obtained SQL ID is A1887AEC05DF723958F85E2AA89C8085. Here is an example of binding an outline based on the SQL ID:
```
obclient> CREATE OUTLINE otl ON "A1887AEC05DF723958F85E2AA89C8085" USING HINT /*+max_concurrent(0)*/; 
```
  For more information about the CREATE OUTLINE statement, see CREATE OUTLINE.
- Fuzzy outline binding
  
  Notice
  
  For OceanBase Database V4.2.x, only V4.2.2 and later support fuzzy outline binding. OceanBase Database V4.3.x does not support fuzzy outline binding.
  
  The syntax is as follows:
```
CREATE [OR REPLACE] FORMAT OUTLINE outline_name ON format_stmt [ TO format_target_stmt ]
```
  or
```
CREATE [OR REPLACE] FORMAT OUTLINE outline_name ON format_sql_id USING HINT hint;
```
  The parameters are described as follows:
  - outline_name: the name of the outline.
  - format_stmt: the SQL statement rewritten based on normalization rules. Usually, you can obtain format_stmt by using an expression or a database management system (DBMS) package.
    
    The normalization rules are described as follows:
    - Constant parameters are normalized.
    - Statements are normalized in uppercase.
    - Differences of non-syntactic definition symbols, such as spaces and line breaks, are ignored.
    - IN expressions are normalized.
  - format_target_stmt: You can obtain format_target_stmt in the same way as format_stmt. However, if you want to fix the execution plan for a statement with a hint, use format_target_stmt to specify the original SQL statement. When you use the TO format_target_stmt clause, format_stmt of the original SQL statement must exactly match format_target_stmt without the hint.
  - format_sql_id: the SQL ID calculated based on format_stmt by using MD5. format_stmt is obtained by rewriting the SQL statement based on normalization rules.
  For more information about the CREATE FORMAT OUTLINE statement, see CREATE FORMAT OUTLINE.
View, modify, and drop throttling rules

After you bind an outline, you can view, modify, and drop throttling rules corresponding to the outline as follows:
- You can query the DBA_OB_CONCURRENT_LIMIT_SQL view for the throttling records of an outline. For more information about the DBA_OB_CONCURRENT_LIMIT_SQL view, see oceanbase.DBA_OB_CONCURRENT_LIMIT_SQL.
- You can use the ALTER OUTLINE statement to modify a throttling rule. For more information about the ALTER OUTLINE statement, see ALTER OUTLINE.
- You can use the DROP OUTLINE statement to drop all throttling rules corresponding to the outline specified by outline_name. For more information about the DROP OUTLINE statement, see DROP OUTLINE.

Scenario 2: Handling mixed loads with different types of requests in the database

Scenario description

This scenario can be further divided into two types: hybrid transaction and analytical processing (HTAP) loads, and mixed loads involving production business requests and temporary tasks.

In a typical HTAP load scenario, business activities can be broadly categorized into online transaction processing (OLTP) and online analytical processing (OLAP). The transaction system primarily handles simple SQL statements characterized by high concurrency and strict real-time performance requirements. These SQL statements are executed quickly and do not impose prolonged pressure on the server, allowing them to complete in a short time without causing a backlog of business operations. On the other hand, the reporting system is dominated by complex SQL statements. These queries involve intricate execution logic, consume significant resources, and are generally assigned lower priorities. To ensure the smooth operation of the database system, it is essential to manage resources efficiently at the database layer, balancing the needs of both the transaction and reporting systems.

In scenarios with mixed loads of production business requests and temporary tasks, the workload includes not only regular production SQL requests but also system monitoring and temporary O&M (operations and maintenance) tasks that run intermittently. To ensure stable business operations, system resources must be properly isolated so that production SQL requests are protected from the impact of temporary or abnormal tasks. This resource isolation ensures that critical business processes remain unaffected, maintaining overall system stability and reliability.

Solutions

You can create different resource groups for different business types, place the loads into corresponding resource groups, use Resource Manager to manage CPU and I/O resources in a unified manner, thereby isolating and allocating database resources.

The details of the solution are described as follows:

Throttling strategy

You can implement resource isolation as follows:
- User-level resource isolation: This resource isolation type specifies the mappings between users and resource groups. All SQL statements initiated by the specified user can be executed by using only the resources in the resource group mapped to the user.
- SQL-level resource isolation: You can bind SQL statements with resource groups. An SQL statement can be executed by using only the resources in the resource group to which it is bound.
- Function-level resource isolation: This resource isolation type specifies the mappings between background tasks and resource groups. A background task can be executed by using only the resources in the mapped resource group.
Throttling modes

For more information about throttling, see Overview.

Best practices for resource throttling

Scenario 1: Handling a few SQL requests with performance exceptions in an OLTP scenario

Scenario description

Solutions

Notice

Notice

Scenario 2: Handling mixed loads with different types of requests in the database

Scenario description

Solutions