Applicability
This topic applies only to OceanBase Database in MySQL mode. For information about pagination queries in Oracle mode, see SIMPLE SELECT.
Optimize pagination scenarios
In scenarios where a large number of results are expected for a database query, we recommend that you perform a pagination query.
Pagination based on the unique key
The following example shows how to perform a pagination query based on the unique key.
obclient> select * from tb where c1 = 'xxx' and c2 = 'xxx' and id > 'xxx' limit 100;
This statement reserves the ID of the last row of every 100 rows as the condition value for the next 100 rows. The c1, c2, and id columns are used to create a composite index.
Notice
- This method is suitable for querying tables with even data distribution. It does not show the expected performance if the query generates 100 results per page only by scanning tens or hundreds of thousands of rows.
- The
select * from tb where id > '' limit 100;statement applies to table scans where the query results are paged only by the id column.
Pagination based on subqueries and delayed association
Instead of skipping the offset rows, OceanBase Database retrieves offset+N rows, drops the offset rows, and then returns the N rows. Therefore, when the number of offset rows is large, the query efficiency is low. In this case, you can control the total number of returned pages, or rewrite the SQL statements if the page number exceeds a specific threshold.
Control the total number of returned pages.
You need to code the logic of a pagination query in a way that the query directly returns N rows without executing the pagination statement if the total number of offset rows is 0.
For example, if a transaction returns more than 1,000 pages of results, when the user clicks the last page, the database is almost paralyzed.
Rewrite the SQL statements when the page number exceeds a specific threshold.
In the following example, the statement is rewritten to perform a subquery, which quickly locates the ID range to be retrieved, and then associates it with the query condition.
obclient> select a.name from a,(select id from a where id LIMIT 100000,20) b where a.id=b.id;
Case analysis
Create a table by executing the following sample statement:
obclient> CREATE TABLE `business_case_analysis_effect_receive_catalog_details` (
`id` bigint(10) NOT NULL AUTO_INCREMENT,
`province_code` varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT '330000' COMMENT 'Province code',
`province_name` varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT 'Zhejiang' COMMENT 'Province name',
`area_code` varchar(20) CHARACTER SET utf8 DEFAULT NULL COMMENT 'City code',
`area_name` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'City name',
`region_code` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'District/county code',
`region_name` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'District/county name',
`dept_code` varchar(100) CHARACTER SET utf8 NOT NULL COMMENT 'Department code',
`dept_name` varchar(100) CHARACTER SET utf8 NOT NULL COMMENT 'Department name',
`catalog_id` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog ID',
`catalog_name` varchar(500) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog name',
`catalog_code` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog code',
`business_code` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Business code',
`business_name` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Business name',
`received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received parcels',
`app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by app',
`pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by PC',
`hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received in hall',
`window_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by window',
`two_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received parcels 2.0',
`two_app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by app 2.0',
`two_pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by PC 2.0',
`two_hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received in hall 2.0',
`two_window_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by window 2.0',
`one_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received parcels 1.0',
`one_app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by app 1.0',
`one_pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by PC 1.0',
`one_hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received in hall 1.0',
`item_flag` varchar(10) CHARACTER SET utf8 NOT NULL COMMENT 'Item 2.0 (0-No, 1-Yes)',
`stat_date` datetime NOT NULL COMMENT 'Statistical time, which is the time when the data was generated, typically the previous day.',
`delete_flag` int(1) NOT NULL DEFAULT '0' COMMENT 'Deleted (0-No, 1-Yes)',
`gmt_created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
`gmt_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update time',
`not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of not collected parcels with code assigned',
`item_type_code` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'Item type ID',
`item_dock_type` varchar(10) CHARACTER SET utf8 NOT NULL COMMENT 'Item docking method (0-All, 1-Overall docking, 2-Docking form 2.0)',
`apply_person_type` varchar(20) CHARACTER SET utf8 NOT NULL COMMENT 'Applicant type (0-All, 1-Individual, 2-Legal person)',
`two_not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of not collected parcels with code assigned 2.0',
`one_not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of not collected parcels with code assigned 1.0',
`self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Parcels by self-service lockers',
`two_self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Parcels by self-service lockers 2.0',
`one_self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Parcels by self-service lockers 1.0',
PRIMARY KEY (`id`) USING BTREE,
KEY `query_catalog_index` (`catalog_id`,`catalog_code`,`stat_date`,`area_code`,`region_code`,`dept_code`,`business_code`,`business_name`) USING BTREE GLOBAL,
KEY `idx_area_region` (`area_code`,`region_code`,`item_flag`,`item_type_code`) USING BTREE GLOBAL,
KEY `idx_statDate` (`stat_date`,`area_code`,`region_code`) USING BTREE GLOBAL,
KEY `idx_catalog_code` (`catalog_code`,`catalog_id`) USING BTREE GLOBAL,
KEY `idx_item_type_code` (`item_type_code`) USING BTREE GLOBAL,
KEY `idx_catalog_business` (`catalog_code`,`catalog_id`,`business_code`) USING BTREE GLOBAL
) AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COMMENT='Business dashboard - processing - processing results - parcels received - catalog distribution';
The original SQL query is as follows:
SELECT
catalog_id AS catalogId,
catalog_name AS catalogName,
catalog_code AS catalogCode,
business_code AS businessCode,
business_name AS businessName,
IFNULL(
CAST(
SUM(pc_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS networkHandlePercent,
IFNULL(
CAST(
SUM(app_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS palmtopHandlePercent,
IFNULL(
CAST(
SUM(two_received_num) /(
SUM(received_num) + SUM(not_case_assigned_code_num)
) AS DECIMAL(14, 4)
),
0
) AS netHandleRate,
IFNULL(SUM(not_case_assigned_code_num), 0) AS notCaseAssignedCodeNum,
IFNULL(SUM(received_num), 0) AS receivedNum,
IFNULL(SUM(two_received_num), 0) AS twoReceivedNum,
IFNULL(SUM(pc_received_num), 0) AS pcReceivedNum,
IFNULL(SUM(app_received_num), 0) AS appReceivedNum,
IFNULL(SUM(hall_received_num), 0) AS hallReceivedNum,
IFNULL(SUM(two_window_received_num), 0) AS windowReceivedNum,
IFNULL(SUM(self_service_num), 0) AS selfServiceNum
FROM
business_case_analysis_effect_receive_catalog_details
WHERE
stat_date <= 'xxxx-xx-xx'
AND stat_date >= 'xxxx-xx-xx'
AND item_type_code in ("xx", "xx", "xx", "xx", "xx", "xx", "xx")
GROUP BY
catalog_code,
catalog_id
LIMIT
offset, 15;
The table contains 5 million data records.
The following table shows the execution time in seconds after the modification of offset.
| Offset | 0 | 100 | 300 | 500 | 1000 | 2000 |
|---|---|---|---|---|---|---|
| Execution duration (s) | 0.07 | 3.96 | 6.83 | 8.67 | 14.44 | 18.04 |
No index is created for the table. Therefore, the execution time does not change much after an index is specified in the WHERE clause. In this case, parallel execution is considered. The following part shows the optimized SQL statements:
SELECT
/*+ PARALLEL(5),USE_HASH_AGGREGATION*/
catalog_id AS catalogId,
catalog_name AS catalogName,
catalog_code AS catalogCode,
business_code AS businessCode,
business_name AS businessName,
IFNULL(
CAST(
SUM(pc_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS networkHandlePercent,
IFNULL(
CAST(
SUM(app_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS palmtopHandlePercent,
IFNULL(
CAST(
SUM(two_received_num) /(
SUM(received_num) + SUM(not_case_assigned_code_num)
) AS DECIMAL(14, 4)
),
0
) AS netHandleRate,
IFNULL(SUM(not_case_assigned_code_num), 0) AS notCaseAssignedCodeNum,
IFNULL(SUM(received_num), 0) AS receivedNum,
IFNULL(SUM(two_received_num), 0) AS twoReceivedNum,
IFNULL(SUM(pc_received_num), 0) AS pcReceivedNum,
IFNULL(SUM(app_received_num), 0) AS appReceivedNum,
IFNULL(SUM(hall_received_num), 0) AS hallReceivedNum,
IFNULL(SUM(two_window_received_num), 0) AS windowReceivedNum,
IFNULL(SUM(self_service_num), 0) AS selfServiceNum
FROM
business_case_analysis_effect_receive_catalog_details
WHERE
stat_date <= 'xxxx-xx-xx'
AND stat_date >= 'xxxx-xx-xx'
AND item_type_code in ("xx", "xx", "xx", "xx", "xx", "xx", "xx")
GROUP BY
catalog_code,
catalog_id
LIMIT
offset, 15;
The optimized SQL statements are re-executed based on the offset described in the preceding table, and the query takes about 5 seconds to return results for each offset.
Notice
The following conclusions can be drawn from case analysis:
- A query in an OceanBase cluster takes more time to return the results as the number of offset rows increases.
- The optimal performance is achieved when you set the degree of parallelism to the number of partitions of the table.
Ordered pagination
Ordered pagination ensures the stability of pagination query results. Specifically, it sorts the query results before applying pagination to maintain the consistency of the query result order.
Feature switch
The MySQL mode of OceanBase Database provides the hidden parameter _preserve_order_for_pagination and the query hint PRESERVE_ORDER_FOR_PAGINATION for you to enable and disable the feature of ordered pagination.
The PRESERVE_ORDER_FOR_PAGINATION hint takes effect preferentially in OceanBase Database. For example, if the _preserve_order_for_pagination parameter is specified to enable the feature for the tenant, but the /*+OPT_PARAM('preserve_order_for_pagination', 'FALSE')*/ hint is used in a query, the feature will not be enabled for the query.
Enable or disable the feature by using the hidden parameter
_preserve_order_for_pagination is a tenant-level hidden parameter used to enable or disable ordered pagination. The feature is disabled by default. After you enable this feature, the optimizer will actively add ORDER BY to pagination queries for ordered output.
| Attribute | Description |
|---|---|
| Parameter type | Boolean |
| Default value | False, indicating to disable ordered pagination. |
| Value range | False and True.
|
| Effective mode | Immediate |
| Level | Tenant level |
Here is an example:
Enable ordered pagination in the tenant.
ALTER SYSTEM SET _preserve_order_for_pagination = true;
Enable or disable the feature by using the hint
OceanBase Database also provides the query-level hint PRESERVE_ORDER_FOR_PAGINATION for you to enable or disable ordered pagination in a more fine-grained manner.
Here is an example:
Enable ordered pagination in a specific query.
SELECT /*+OPT_PARAM('PRESERVE_ORDER_FOR_PAGINATION', 'TRUE')*/ * FROM test_tbl1
LIMIT 10;
Notice
This hint cannot determine whether to enable ordered pagination in a specific query block; it can only control whether to enable this feature for the entire query.
Feature description
After you enable ordered pagination, OceanBase Database will rewrite a query by adding the ORDER BY clause to keep the sorting order.
Here is an example:
SELECT t1.c1, t1.c2 FROM t1, t2 WHERE t1.c1 = t2.c1 LIMIT 10,10;
For the above query, OceanBase Database first sorts the query results, and then paginates the sorted results. The rewritten query is as follows:
SELECT t1.c1, t1.c2 FROM t1, t2 WHERE t1.c1 = t2.c1
ORDER BY t1.c1, t2.c2 LIMIT 10,10;
Notice
When ordered pagination is enabled, the performance of some queries may deteriorate due to the additional sorting calculation.
Feature usage considerations
When ordered pagination is enabled, the query output may differ from the original due to the introduction of sorting fields.
The expected result order will be changed in some scenarios. Here is an example:
SELECT t2.c1, t2.c2
FROM (SELECT * from t1 ORDER BY t1.c1, t1.c2) V
LEFT JOIN t2 ON V.c1 = t2.c1
LIMIT 10,10;
For the above query, it is expected that the database sorts data in the t1 table and then paginates the query results. However, such expectation is not guaranteed by any SQL semantics. In the following equivalent query, OceanBase Database adds sorting fields t2.c1, t2.c2 at the outermost layer. This ensures that the query results are sorted by the specified fields in the t2 table before pagination.
SELECT t2.c1, t2.c2
FROM t1
LEFT JOIN t2 ON t1.c1 = t2.c1
ORDER BY t2.c1, t2.c2
LIMIT 10,10;