Optimize query by page
If a large number of results are expected for a database query, we recommend that you perform the query by page.
Pagination based on the unique key
Example: perform a pagination query based on the unique key.
obclient> select * from tb where c1 = 'xxx' and c2 = 'xxx' and id > 'xxx' limit 100;
This statement reserves the ID of the last row of every 100 rows as the condition value for the next 100 rows. The c1, c2, and id columns are used to create a composite index.
Notice
- This method is suitable for querying tables with even data distribution. It does not show the expected performance if the query generates 100 results per page only by scanning tens or hundreds of thousands of rows.
- The
select * from tb where id > '' limit 100;statement applies to table scans where the query results are paged only by theidcolumn.
Pagination based on subqueries and delayed association
Instead of skipping the offset rows, OceanBase Database retrieves offset+N rows, drops the offset rows, and then returns the N rows. Therefore, when the number of offset rows is large, the query efficiency is low. In this case, you can control the total number of returned pages, or rewrite the SQL statements if the page number exceeds a specific threshold.
Control the total number of returned pages.
You need to code the logic of a pagination query in a way that the query directly returns N rows without executing the pagination statement if the total number of offset rows is 0.
For example, if a transaction returns more than 1,000 pages of results, when the user clicks the last page, the database is almost paralyzed.
Rewrite the SQL statements when the page number exceeds a specific threshold.
In the following example, the statement is rewritten to perform a subquery, which quickly locates the ID range to be retrieved, and then associates it with the query condition.
obclient> select a.name from a,(select id from a where id LIMIT 100000,20) b where a.id=b.id;
Case analysis
Create a table by executing the following sample statements:
obclient> CREATE TABLE `business_case_analysis_effect_receive_catalog_details` (
`id` bigint(10) NOT NULL AUTO_INCREMENT,
`province_code` varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT '330000' COMMENT 'Province code',
`province_name` varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT 'Zhejiang' COMMENT 'Province name',
`area_code` varchar(20) CHARACTER SET utf8 DEFAULT NULL COMMENT 'City code',
`area_name` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'City name',
`region_code` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'District/county code',
`region_name` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'District/county name',
`dept_code` varchar(100) CHARACTER SET utf8 NOT NULL COMMENT 'Department code',
`dept_name` varchar(100) CHARACTER SET utf8 NOT NULL COMMENT 'Department name',
`catalog_id` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog ID',
`catalog_name` varchar(500) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog name',
`catalog_code` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Catalog code',
`business_code` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Business code',
`business_name` varchar(50) CHARACTER SET utf8 DEFAULT NULL COMMENT 'Business name',
`received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received parcels',
`app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by app',
`pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by PC',
`hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received in hall',
`window_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by window',
`two_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received parcels 2.0',
`two_app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by app 2.0',
`two_pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by PC 2.0',
`two_hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received in hall 2.0',
`two_window_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by window 2.0',
`one_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received parcels 1.0',
`one_app_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by app 1.0',
`one_pc_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received by PC 1.0',
`one_hall_received_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Received in hall 1.0',
`item_flag` varchar(10) CHARACTER SET utf8 NOT NULL COMMENT 'Item 2.0 (0-No, 1-Yes)',
`stat_date` datetime NOT NULL COMMENT 'Statistical time, which is the time when the data was generated, typically the previous day.',
`delete_flag` int(1) NOT NULL DEFAULT '0' COMMENT 'Deleted (0-NO, 1-Yes)',
`gmt_created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
`gmt_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update time',
`not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of not collected parcels with code assigned',
`item_type_code` varchar(50) CHARACTER SET utf8 NOT NULL COMMENT 'Item type ID',
`item_dock_type` varchar(10) CHARACTER SET utf8 NOT NULL COMMENT 'Item docking method (0-All, 1-Overall docking, 2-Docking form 2.0)',
`apply_person_type` varchar(20) CHARACTER SET utf8 NOT NULL COMMENT 'Applicant type (0-All, 1-Individual, 2-Legal person)',
`two_not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of not collected parcels with code assigned 2.0',
`one_not_case_assigned_code_num` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Number of not collected parcels with code assigned 1.0',
`self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Parcels by self-service lockers',
`two_self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Parcels by self-service lockers 2.0',
`one_self_service_num` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Parcels by self-service lockers 1.0',
PRIMARY KEY (`id`) USING BTREE,
KEY `query_catalog_index` (`catalog_id`,`catalog_code`,`stat_date`,`area_code`,`region_code`,`dept_code`,`business_code`,`business_name`) USING BTREE GLOBAL,
KEY `idx_area_region` (`area_code`,`region_code`,`item_flag`,`item_type_code`) USING BTREE GLOBAL,
KEY `idx_statDate` (`stat_date`,`area_code`,`region_code`) USING BTREE GLOBAL,
KEY `idx_catalog_code` (`catalog_code`,`catalog_id`) USING BTREE GLOBAL,
KEY `idx_item_type_code` (`item_type_code`) USING BTREE GLOBAL,
KEY `idx_catalog_business` (`catalog_code`,`catalog_id`,`business_code`) USING BTREE GLOBAL
) AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COMMENT='Business dashboard - processing - processing results - parcels received - catalog distribution';
Original SQL statements for the query:
SELECT
catalog_id AS catalogId,
catalog_name AS catalogName,
catalog_code AS catalogCode,
business_code AS businessCode,
business_name AS businessName,
IFNULL(
CAST(
SUM(pc_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS networkHandlePercent,
IFNULL(
CAST(
SUM(app_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS palmtopHandlePercent,
IFNULL(
CAST(
SUM(two_received_num) /(
SUM(received_num) + SUM(not_case_assigned_code_num)
) AS DECIMAL(14, 4)
),
0
) AS netHandleRate,
IFNULL(SUM(not_case_assigned_code_num), 0) AS notCaseAssignedCodeNum,
IFNULL(SUM(received_num), 0) AS receivedNum,
IFNULL(SUM(two_received_num), 0) AS twoReceivedNum,
IFNULL(SUM(pc_received_num), 0) AS pcReceivedNum,
IFNULL(SUM(app_received_num), 0) AS appReceivedNum,
IFNULL(SUM(hall_received_num), 0) AS hallReceivedNum,
IFNULL(SUM(two_window_received_num), 0) AS windowReceivedNum,
IFNULL(SUM(self_service_num), 0) AS selfServiceNum
FROM
business_case_analysis_effect_receive_catalog_details
WHERE
stat_date <= 'xxxx-xx-xx'
AND stat_date >= 'xxxx-xx-xx'
AND item_type_code in ("xx", "xx", "xx", "xx", "xx", "xx", "xx")
GROUP BY
catalog_code,
catalog_id
LIMIT
offset, 15;
The table contains 5 million data records.
The following table shows the execution time in seconds after the modification of offset:
| Offset | 0 | 100 | 300 | 500 | 1000 | 2000 |
|---|---|---|---|---|---|---|
| Execution duration (s) | 0.07 | 3.96 | 6.83 | 8.67 | 14.44 | 18.04 |
No index is created for the table. Therefore, the execution time does not change much after an index is specified in the WHERE clause. In this case, parallel execution is considered. The following part shows the optimized SQL statements:
SELECT
/*+ PARALLEL(5),USE_HASH_AGGREGATION*/
catalog_id AS catalogId,
catalog_name AS catalogName,
catalog_code AS catalogCode,
business_code AS businessCode,
business_name AS businessName,
IFNULL(
CAST(
SUM(pc_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS networkHandlePercent,
IFNULL(
CAST(
SUM(app_received_num) / SUM(received_num) AS DECIMAL(14, 4)
),
0
) AS palmtopHandlePercent,
IFNULL(
CAST(
SUM(two_received_num) /(
SUM(received_num) + SUM(not_case_assigned_code_num)
) AS DECIMAL(14, 4)
),
0
) AS netHandleRate,
IFNULL(SUM(not_case_assigned_code_num), 0) AS notCaseAssignedCodeNum,
IFNULL(SUM(received_num), 0) AS receivedNum,
IFNULL(SUM(two_received_num), 0) AS twoReceivedNum,
IFNULL(SUM(pc_received_num), 0) AS pcReceivedNum,
IFNULL(SUM(app_received_num), 0) AS appReceivedNum,
IFNULL(SUM(hall_received_num), 0) AS hallReceivedNum,
IFNULL(SUM(two_window_received_num), 0) AS windowReceivedNum,
IFNULL(SUM(self_service_num), 0) AS selfServiceNum
FROM
business_case_analysis_effect_receive_catalog_details
WHERE
stat_date <= 'xxxx-xx-xx'
AND stat_date >= 'xxxx-xx-xx'
AND item_type_code in ("xx", "xx", "xx", "xx", "xx", "xx", "xx")
GROUP BY
catalog_code,
catalog_id
LIMIT
offset, 15;
The optimized SQL statements are re-executed based on the offset described in the preceding table, and the query takes about 5 seconds to return results for each offset.
Notice
The case analysis shows that:
- A query in an OceanBase cluster takes more time to return the results as the number of offset rows increases.
- The optimal performance is achieved when you set the degree of parallelism to the number of partitions of the table.