Applicability
This topic applies only to OceanBase Database in MySQL-compatible mode. For information about pagination in Oracle-compatible mode, see SIMPLE SELECT.
Optimize pagination scenarios
For scenarios where the query result set in the database is excessively large, it is recommended to implement pagination.
Pagination based on unique keys
Example: When performing pagination, you can use unique keys for pagination.
obclient> select * from tb where c1 = 'xxx' and c2 = 'xxx' and id > 'xxx' limit 100;
Keep the last id of every 100 records as the condition value for the next 100 records (create a composite index on c1, c2, and id).
Notice
- This method is suitable for data that is evenly distributed. If the query cost for 100 records per page is scanning tens of thousands or even hundreds of thousands of records, the performance will not meet the expected standards.
- This method is suitable for scenarios where you only need to retrieve all records based on the id, such as
select * from tb where id > '' limit 100;.
Use of delayed joins or subqueries for pagination
OB does not skip the offset rows, but instead retrieves offset+N rows and then discards the first offset rows before returning N rows. Therefore, when the offset is particularly large, the efficiency is very low. When the business encounters such scenarios, you can consider controlling the total number of pages returned or rewriting the SQL query for pages exceeding a specific threshold.
Control the total number of pages returned.
When implementing pagination logic in the code, if the count is 0, directly return the result without executing the subsequent pagination statements.
Example: In the "Service Market" transaction, when the pagination exceeds 1000 pages, clicking the last page causes the database to be almost paralyzed.
Rewrite the SQL query for pages exceeding a specific threshold.
Example: First, quickly locate the required id range, and then perform a join:
obclient> select a.name from a,(select id from a where id LIMIT 100000,20) b where a.id=b.id;
Examples
The following statement creates a table:
obclient> CREATE TABLE `business_case_analysis_effect_receive_catalog_details` (
`id` bigint(10) NOT NULL AUTO_INCREMENT,
`province_code` varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT '330000' COMMENT 'Province code',
`province_name` varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT 'Zhejiang Province' COMMENT 'Province name',
`area_code` varchar(20) CHARACTER SET utf8 DEFAULT NULL COMMENT 'City code',
`area_name` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'City name',
`region_code` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'District code',
`region_name` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'District name',
`dept_code` varchar(100) CHARACTER SET utf8 NOT NULL COMMENT 'Department code',
`dept_name` varchar(100) CHARACTER SET utf8 NOT NULL COMMENT 'Department name',
`catalog_id` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog code',
`catalog_name` varchar(500) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog name',
`catalog_code` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Basic catalog code',
`business_code` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Line code',
`business_name` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Line name',
`received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received items',
`app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received via APP',
`pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received via PC',
`hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received at the service hall',
`window_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received at the dedicated window',
`two_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received under version 2.0',
`two_app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received via APP under version 2.0',
`two_pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received via PC under version 2.0',
`two_hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received at the service hall under version 2.0',
`two_window_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received at the dedicated window under version 2.0',
`one_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received under version 1.0',
`one_app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received via APP under version 1.0',
`one_pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received via PC under version 1.0',
`one_hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of items received at the service hall under version 1.0',
`item_flag` varchar(10) CHARACTER SET utf8 NOT NULL COMMENT 'Whether it is a version 2.0 item (0-No, 1-Yes)',
`stat_date` datetime NOT NULL COMMENT 'Statistics date (the date when the data was generated, usually the previous day)',
`delete_flag` int(1) NOT NULL DEFAULT '0' COMMENT 'Whether it is deleted (0-Not deleted, 1-Deleted)',
`gmt_created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
`gmt_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update time',
`not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of items not assigned a code',
`item_type_code` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'Power item type ID',
`item_dock_type` varchar(10) CHARACTER SET utf8 NOT NULL COMMENT 'Item docking method (0-All, 1-Overall docking, 2-Form docking for version 2.0)',
`apply_person_type` varchar(20) CHARACTER SET utf8 NOT NULL COMMENT 'Applicant type (0-All, 1-Individual, 2-Organization)',
`two_not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of items not assigned a code under version 2.0',
`one_not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of items not assigned a code under version 1.0',
`self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of self-service items',
`two_self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of self-service items under version 2.0',
`one_self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of self-service items under version 1.0',
PRIMARY KEY (`id`) USING BTREE,
KEY `query_catalog_index` (`catalog_id`,`catalog_code`,`stat_date`,`area_code`,`region_code`,`dept_code`,`business_code`,`business_name`) USING BTREE GLOBAL,
KEY `idx_area_region` (`area_code`,`region_code`,`item_flag`,`item_type_code`) USING BTREE GLOBAL,
KEY `idx_statDate` (`stat_date`,`area_code`,`region_code`) USING BTREE GLOBAL,
KEY `idx_catalog_code` (`catalog_code`,`catalog_id`) USING BTREE GLOBAL,
KEY `idx_item_type_code` (`item_type_code`) USING BTREE GLOBAL,
KEY `idx_catalog_business` (`catalog_code`,`catalog_id`,`business_code`) USING BTREE GLOBAL
) AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COMMENT='Business dashboard - Items - Item effectiveness - Received items - Catalog distribution';
The original query is as follows:
SELECT
catalog_id AS catalogId,
catalog_name AS catalogName,
catalog_code AS catalogCode,
business_code AS businessCode,
business_name AS businessName,
IFNULL(
CAST(
SUM(pc_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS networkHandlePercent,
IFNULL(
CAST(
SUM(app_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS palmtopHandlePercent,
IFNULL(
CAST(
SUM(two_received_num) /(
SUM(received_num) + SUM(not_case_assigned_code_num)
) AS DECIMAL(14, 4)
),
0
) AS netHandleRate,
IFNULL(SUM(not_case_assigned_code_num), 0) AS notCaseAssignedCodeNum,
IFNULL(SUM(received_num), 0) AS receivedNum,
IFNULL(SUM(two_received_num), 0) AS twoReceivedNum,
IFNULL(SUM(pc_received_num), 0) AS pcReceivedNum,
IFNULL(SUM(app_received_num), 0) AS appReceivedNum,
IFNULL(SUM(hall_received_num), 0) AS hallReceivedNum,
IFNULL(SUM(two_window_received_num), 0) AS windowReceivedNum,
IFNULL(SUM(self_service_num), 0) AS selfServiceNum
FROM
business_case_analysis_effect_receive_catalog_details
WHERE
stat_date <= 'xxxx-xx-xx'
AND stat_date >= 'xxxx-xx-xx'
AND item_type_code in ("xx", "xx", "xx", "xx", "xx", "xx", "xx")
GROUP BY
catalog_code,
catalog_id
LIMIT
offset, 15;
The table contains 5 million records.
By modifying the offset offset, the query execution times are as follows:
| offset | 0 | 100 | 300 | 500 | 1000 | 2000 |
|---|---|---|---|---|---|---|
| Execution time (s) | 0.07 | 3.96 | 6.83 | 8.67 | 14.44 | 18.04 |
Since the table does not have any indexes, the execution time did not change significantly after adding an index to the filter condition. In this case, parallel execution was considered. The SQL statement was optimized as follows:
SELECT
/*+ PARALLEL(5),USE_HASH_AGGREGATION*/
catalog_id AS catalogId,
catalog_name AS catalogName,
catalog_code AS catalogCode,
business_code AS businessCode,
business_name AS businessName,
IFNULL(
CAST(
SUM(pc_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS networkHandlePercent,
IFNULL(
CAST(
SUM(app_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS palmtopHandlePercent,
IFNULL(
CAST(
SUM(two_received_num) /(
SUM(received_num) + SUM(not_case_assigned_code_num)
) AS DECIMAL(14, 4)
),
0
) AS netHandleRate,
IFNULL(SUM(not_case_assigned_code_num), 0) AS notCaseAssignedCodeNum,
IFNULL(SUM(received_num), 0) AS receivedNum,
IFNULL(SUM(two_received_num), 0) AS twoReceivedNum,
IFNULL(SUM(pc_received_num), 0) AS pcReceivedNum,
IFNULL(SUM(app_received_num), 0) AS appReceivedNum,
IFNULL(SUM(hall_received_num), 0) AS hallReceivedNum,
IFNULL(SUM(two_window_received_num), 0) AS windowReceivedNum,
IFNULL(SUM(self_service_num), 0) AS selfServiceNum
FROM
business_case_analysis_effect_receive_catalog_details
WHERE
stat_date <= 'xxxx-xx-xx'
AND stat_date >= 'xxxx-xx-xx'
AND item_type_code in ("xx", "xx", "xx", "xx", "xx", "xx", "xx")
GROUP BY
catalog_code,
catalog_id
LIMIT
offset, 15;
After this optimization, the query execution time for the same offset values was around 5 seconds.
Notice
After analyzing this case, the following conclusions were drawn:
- In an OceanBase cluster, as the offset increases, the query response time also increases.
- When setting the parallelism, the optimal number of parallel threads is equal to the number of partitions in the table.
Pagination order preservation
Pagination order preservation is used to ensure that the results of a query are stable. In other words, pagination order preservation is used to sort the query results and then perform pagination processing to maintain the consistency of the query results.
Enable or disable pagination order preservation
OceanBase Database provides two methods for you to enable or disable the pagination order preservation feature for a query in MySQL-compatible mode: the hidden parameter _preserve_order_for_pagination and the query hint PRESERVE_ORDER_FOR_PAGINATION.
OceanBase Database prioritizes the query hint. For example, if the hidden parameter is set to ON for a tenant, but the query hint PRESERVE_ORDER_FOR_PAGINATION is set to OFF, the pagination order preservation feature is not enabled for the query.
Use the hidden parameter to enable or disable the pagination order preservation feature
The hidden parameter _preserve_order_for_pagination is at the tenant level and is used to enable or disable the pagination order preservation feature for the tenant. The default value is OFF. If you set this parameter to ON, the optimizer adds an ORDER BY clause to a paginated query to ensure that the query outputs data in an ordered manner.
| Attribute | Description |
|---|---|
| Parameter type | Boolean |
| Default value | OFF, which indicates that the pagination order preservation feature is disabled by default. |
| Value range | OFF or ON.
|
| Effective mode | Immediately effective |
| Scope | Tenant-level |
Here is an example:
To enable the pagination order preservation feature for a tenant, run the following command:
ALTER SYSTEM SET _preserve_order_for_pagination = true;
Use the query hint to enable or disable the pagination order preservation feature
To enable or disable the pagination order preservation feature for a query, you can use the query hint PRESERVE_ORDER_FOR_PAGINATION.
Here is an example:
To enable the pagination order preservation feature for a specific query, run the following command:
SELECT /*+OPT_PARAM('PRESERVE_ORDER_FOR_PAGINATION', 'TRUE')*/ * FROM test_tbl1
LIMIT 10;
Notice
This hint cannot be used to enable or disable the pagination order preservation feature for a specific query block. It can only be used to enable or disable the feature for the entire query.
Enable or disable the pagination order preservation feature
After you enable the pagination order preservation feature, OceanBase Database rewrites the query and adds an ORDER BY clause to ensure the pagination order preservation.
For example,
SELECT t1.c1, t1.c2 FROM t1, t2 WHERE t1.c1 = t2.c1 LIMIT 10,10;
For the preceding query, OceanBase Database sorts the query results and then performs pagination processing on the sorted results. The rewritten query is as follows:
SELECT t1.c1, t1.c2 FROM t1, t2 WHERE t1.c1 = t2.c1
ORDER BY t1.c1, t2.c2 LIMIT 10,10;
Notice
After you enable the pagination order preservation feature, some queries may experience performance degradation. This is because the feature introduces additional sorting calculations, which can reduce query execution performance.
Considerations
After you enable the pagination order preservation feature, the query output results may be different from those before the feature is enabled, because the feature adds sorting fields.
The expected result order may change in other scenarios. For example:
SELECT t2.c1, t2.c2
FROM (SELECT * from t1 ORDER BY t1.c1, t1.c2) V
LEFT JOIN t2 ON V.c1 = t2.c1
LIMIT 10,10;
The user may expect that the query results are sorted by the t1 table and then paginated. However, any SQL semantics do not guarantee that the query data must be sorted by the t1 table and then paginated. OceanBase Database adds sorting fields t2.c1 and t2.c2 at the outermost layer and sorts the execution results by the output fields (i.e., the fields of the t2 table) and then performs pagination. The equivalent query is as follows:
SELECT t2.c1, t2.c2
FROM t1
LEFT JOIN t2 ON t1.c1 = t2.c1
ORDER BY t2.c1, t2.c2
LIMIT 10,10;