Applicability
This topic applies only to OceanBase Database in MySQL mode. For information about pagination in Oracle mode, see SIMPLE SELECT.
Optimize pagination scenarios
When the query result set contains a large number of records, you are advised to use pagination.
Pagination based on unique keys
Example: You can use unique keys for pagination.
obclient> select * from tb where c1 = 'xxx' and c2 = 'xxx' and id > 'xxx' limit 100;
Retain the last id of every 100 records as the condition value for the next 100 records (create a composite index on c1, c2, and id).
Notice
- This method is suitable for data with uniform distribution. If the query cost for 100 records is scanning tens of thousands of records, the query performance will not meet the expected requirements.
- This method is suitable for scenarios where you only need to use the id field for pagination to query all records in a table. Example:
select * from tb where id > '' limit 100;
Use of delayed joins or subqueries for pagination
OB does not skip offset rows. Instead, it retrieves offset + N rows and returns only the last N rows. Therefore, when the offset is large, the query performance is very low. When you encounter such scenarios in your business, you can control the total number of pages returned or rewrite the SQL statement for pages exceeding a specific threshold.
Control the total number of pages returned.
When you write the pagination logic in the code, if the count is 0, directly return the result without executing the pagination statement.
Example: When a transaction in the "Service Market" has more than 1,000 pages, the database is almost paralyzed when the user clicks the last page.
Rewrite SQL statements for pages exceeding a specific threshold.
Example: First, quickly locate the id range that needs to be obtained, and then perform a join.
obclient> select a.name from a,(select id from a where id LIMIT 100000,20) b where a.id=b.id;
Examples
The following is the statement for creating a table:
obclient> CREATE TABLE `business_case_analysis_effect_receive_catalog_details` (
`id` bigint(10) NOT NULL AUTO_INCREMENT,
`province_code` varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT '330000' COMMENT 'Province code',
`province_name` varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT 'Zhejiang' COMMENT 'Province name',
`area_code` varchar(20) CHARACTER SET utf8 DEFAULT NULL COMMENT 'City code',
`area_name` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'City name',
`region_code` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'District code',
`region_name` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'District name',
`dept_code` varchar(100) CHARACTER SET utf8 NOT NULL COMMENT 'Department code',
`dept_name` varchar(100) CHARACTER SET utf8 NOT NULL COMMENT 'Department name',
`catalog_id` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog code',
`catalog_name` varchar(500) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog name',
`catalog_code` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Basic catalog code',
`business_code` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Line code',
`business_name` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Line name',
`received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases',
`app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases through APP',
`pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases through PC',
`hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases at the service hall',
`window_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases at the dedicated window',
`two_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases under version 2.0',
`two_app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases through APP under version 2.0',
`two_pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases through PC under version 2.0',
`two_hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases at the service hall under version 2.0',
`two_window_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases at the dedicated window under version 2.0',
`one_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases under version 1.0',
`one_app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases through APP under version 1.0',
`one_pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases through PC under version 1.0',
`one_hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of received cases at the service hall under version 1.0',
`item_flag` varchar(10) CHARACTER SET utf8 NOT NULL COMMENT 'Whether it is a version 2.0 matter (0-No, 1-Yes)',
`stat_date` datetime NOT NULL COMMENT 'Statistics date (the date when the data was generated, usually the previous day)',
`delete_flag` int(1) NOT NULL DEFAULT '0' COMMENT 'Whether it is deleted (0-Not deleted, 1-Deleted)',
`gmt_created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation date',
`gmt_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update date',
`not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of unassigned cases',
`item_type_code` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'Power matter type ID',
`item_dock_type` varchar(10) CHARACTER SET utf8 NOT NULL COMMENT 'Matter docking method (0-All, 1-Overall docking, 2-Form docking for version 2.0)',
`apply_person_type` varchar(20) CHARACTER SET utf8 NOT NULL COMMENT 'Applicant type (0-All, 1-Individual, 2-Organization)',
`two_not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of unassigned cases under version 2.0',
`one_not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of unassigned cases under version 1.0',
`self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of self-service machine cases',
`two_self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of self-service machine cases under version 2.0',
`one_self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Number of self-service machine cases under version 1.0',
PRIMARY KEY (`id`) USING BTREE,
KEY `query_catalog_index` (`catalog_id`,`catalog_code`,`stat_date`,`area_code`,`region_code`,`dept_code`,`business_code`,`business_name`) USING BTREE GLOBAL,
KEY `idx_area_region` (`area_code`,`region_code`,`item_flag`,`item_type_code`) USING BTREE GLOBAL,
KEY `idx_statDate` (`stat_date`,`area_code`,`region_code`) USING BTREE GLOBAL,
KEY `idx_catalog_code` (`catalog_code`,`catalog_id`) USING BTREE GLOBAL,
KEY `idx_item_type_code` (`item_type_code`) USING BTREE GLOBAL,
KEY `idx_catalog_business` (`catalog_code`,`catalog_id`,`business_code`) USING BTREE GLOBAL
) AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COMMENT='Business dashboard - Cases - Effectiveness - Received Cases - Catalog Distribution';
The original query is as follows:
SELECT
catalog_id AS catalogId,
catalog_name AS catalogName,
catalog_code AS catalogCode,
business_code AS businessCode,
business_name AS businessName,
IFNULL(
CAST(
SUM(pc_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS networkHandlePercent,
IFNULL(
CAST(
SUM(app_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS palmtopHandlePercent,
IFNULL(
CAST(
SUM(two_received_num) /(
SUM(received_num) + SUM(not_case_assigned_code_num)
) AS DECIMAL(14, 4)
),
0
) AS netHandleRate,
IFNULL(SUM(not_case_assigned_code_num), 0) AS notCaseAssignedCodeNum,
IFNULL(SUM(received_num), 0) AS receivedNum,
IFNULL(SUM(two_received_num), 0) AS twoReceivedNum,
IFNULL(SUM(pc_received_num), 0) AS pcReceivedNum,
IFNULL(SUM(app_received_num), 0) AS appReceivedNum,
IFNULL(SUM(hall_received_num), 0) AS hallReceivedNum,
IFNULL(SUM(two_window_received_num), 0) AS windowReceivedNum,
IFNULL(SUM(self_service_num), 0) AS selfServiceNum
FROM
business_case_analysis_effect_receive_catalog_details
WHERE
stat_date <= 'xxxx-xx-xx'
AND stat_date >= 'xxxx-xx-xx'
AND item_type_code in ("xx", "xx", "xx", "xx", "xx", "xx", "xx")
GROUP BY
catalog_code,
catalog_id
LIMIT
offset, 15;
The table contains 5 million rows.
The query execution time varies with the value of the offset parameter. The following table shows the query execution time for different values of the offset parameter.
| offset | 0 | 100 | 300 | 500 | 1000 | 2000 |
|---|---|---|---|---|---|---|
| Execution time (s) | 0.07 | 3.96 | 6.83 | 8.67 | 14.44 | 18.04 |
The table has no indexes. After adding an index to the filter condition, the query execution time did not change significantly. Therefore, parallel execution was added to optimize the query. The optimized SQL statement is as follows:
SELECT
/*+ PARALLEL(5),USE_HASH_AGGREGATION*/
catalog_id AS catalogId,
catalog_name AS catalogName,
catalog_code AS catalogCode,
business_code AS businessCode,
business_name AS businessName,
IFNULL(
CAST(
SUM(pc_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS networkHandlePercent,
IFNULL(
CAST(
SUM(app_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS palmtopHandlePercent,
IFNULL(
CAST(
SUM(two_received_num) /(
SUM(received_num) + SUM(not_case_assigned_code_num)
) AS DECIMAL(14, 4)
),
0
) AS netHandleRate,
IFNULL(SUM(not_case_assigned_code_num), 0) AS notCaseAssignedCodeNum,
IFNULL(SUM(received_num), 0) AS receivedNum,
IFNULL(SUM(two_received_num), 0) AS twoReceivedNum,
IFNULL(SUM(pc_received_num), 0) AS pcReceivedNum,
IFNULL(SUM(app_received_num), 0) AS appReceivedNum,
IFNULL(SUM(hall_received_num), 0) AS hallReceivedNum,
IFNULL(SUM(two_window_received_num), 0) AS windowReceivedNum,
IFNULL(SUM(self_service_num), 0) AS selfServiceNum
FROM
business_case_analysis_effect_receive_catalog_details
WHERE
stat_date <= 'xxxx-xx-xx'
AND stat_date >= 'xxxx-xx-xx'
AND item_type_code in ("xx", "xx", "xx", "xx", "xx", "xx", "xx")
GROUP BY
catalog_code,
catalog_id
LIMIT
offset, 15;
After adding parallel execution, the query execution time for different values of the offset parameter is about 5 seconds. This indicates that the query is optimized.
Notice
After analyzing this example, we found the following:
- In an OceanBase cluster, the query returns time increases as the offset increases.
- When you set the parallelism, the best parallelism is the number of partitions in the table.