Photo by Will Paterson on Unsplash
OceanBase Database Community Edition V4.0 Beta was released on November 3, 2022. The results of the on-site test showed that, with the same hardware configurations, the performance of OceanBase Database Community Edition V4.0 is 1.9 times that of MySQL Enterprise Edition in online transaction processing (OLTP), and 5 to 6 times that of Greenplum 6.22.1 in online analytical processing (OLAP). The OLAP performance was evaluated by TPC-H, a business intelligence benchmark developed by Transaction Processing Performance Council (TPC) to simulate decision support applications. TPC-H implements a data warehouse in the Third Normal Form (3NF) that contains eight basic relationships. The main evaluation metric of TPC-H is the response time (RT) of each query. The TPC-H test has become the world’s most recognized OLAP performance evaluation method, and the test results provide meaningful indicators for OLAP and hybrid transaction/analytical processing (HTAP) databases.
In this article, we will share with you how to perform a TPC-H test on OceanBase Database V4.0 and offer some performance optimization suggestions.
Automatic testing by using OBD
OceanBase Deployer (OBD) is the tool for deploying OceanBase Database Community Edition. It is also a package manager. You can use it to manage all tools of OceanBase Database Community Edition. To help you run the performance test easier and faster, we have integrated Sysbench, TPC-C, and TPC-H tools with OBD. OBD will automatically tune the test parameters based on your operating environment to deliver the best performance.
After installing OBD and OceanBase Database Community Edition, you can simply perform the TPC-H test by running the following command:
obd test tpch obperf — tenant=tpch_mysql -s 100 — remote-tbl-dir=/tmp/tpch100
Figure 1 Sample OBD command for performing a TPC-H test
OBD also allows you to manually perform the test step by step in white-box mode by setting parameters as required. The following introduces some important parameters:
View the process and operation logs
If you want to know what operations are performed in the test, you can specify the ‘-v’ option to instruct OBD to print detailed operation logs of the entire test process, such as logs recording data generation, data import, parameter tuning, major compaction, and SQL query execution.
Disable data file transfer
Before initiating the TPC-H test, you must specify a target OBServer. OBD will transfer the required data files to the specified directory on the target OBServer. These files may be quite large and you may have had them stored in the directory already. In that case, you can specify the ‘--dt=true’ option to disable data file transfer, so that OBD starts the TPC-H test from the data import step.
Execute SQL queries
If you have finished the data import step and just want to execute SQL queries, you can specify the ‘--test-only=true’ option. OBD will tune parameters and execute the 22 SQL queries. After the execution is completed, OBD will also automatically reset the tuned parameters to their default values. If you want to experience performance improvement by manually tuning parameters, you can specify the ‘-O 0’ option, so that OBD will simply execute SQL queries without tuning the parameters. You can freely combine various parameters to experience the performance benefits of the OceanBase Database.
We have provided the test process in detail on the official website of the OceanBase Database. If you want to gain a step-by-step experience in the TPC-H test with your custom settings, feel free to check it out. In this section, we will outline the differences in the test process between OceanBase Database V4.0 and V3.x in the following aspects:
‒ Parameters
OceanBase Database V4.0 adds some parameters to provide new features such as storage push-down and vectorized execution, which play a significant role in performance improvement. After setting the parameters for storage push-down and vectorized execution, we recommend that you refresh the plan cache to avoid hitting an obsolete plan in the plan cache.
‒ CREATE TABLE statement
As the table scanning performance is improved, you don’t need to add an index for table scanning in OceanBase Database V4.0. When you create a table, you can specify the encoding format as CONDENSED. The data will be encoded and saved in the selective encoding format, which is helpful to improve the query performance.
‒ Major compactions
OceanBase Database V4.0 supports major compactions at the tenant level. You can log on to the target tenant to run the major compaction command, or log on to the sys tenant and specify the target tenant to perform the major compaction. Internal tables related to the major compaction are also adjusted accordingly. For example, you can log on to the sys tenant and query the CDB_OB_MAJOR_COMPACTION table to get information about the major compaction of all tenants.
‒ Manual collection of statistics
Since OceanBase Database V4.0, the automatic collection of major compaction statistics is no longer supported. The OceanBase Database optimizer allows you to manually collect the statistics of tables and columns so that it makes more accurate estimates and generates execution plans faster.
The OceanBase Database optimizer allows you to manually collect the statistics by using the DBMS_STATS package (recommended) or running the ANALYZE command. If you have deployed an OceanBase cluster by using OBD or OceanBase Control Platform (OCP), you can use the built-in DBMS_STATS package to collect statistics by calling the DBMS_STATS.GATHER_TABLE_STATS procedure.
If you want to collect statistics by running the ANALYZE command, you must log on to the MySQL tenant and enable SQL extension by executing the following statement: alter system set enable_sql_extension=True. You can also increase the degree of parallelism (DOP) for statistics collection by specifying the _force_parallel_query_dop parameter.
1. _rowsets_enabled
OceanBase Database V4.0 supports vectorized execution, which allows it to process multiple rows of data at a time, significantly reducing function calls and cache misses and improving the SQL executor performance.
To increase the flexibility of the SQL executor, we provide the _rowsets_enabled parameter, which allows you to enable or disable vectorized execution. This parameter takes effect on tenants. You can set the parameter in a user tenant or the sys tenant. If you set the parameter in the sys tenant, you must specify the name of the target user tenant. Vectorized execution affects the RT of all SQL statements. We recommend that you enable it when you perform the test.
2. _pushdown_storage_level
To keep enhancing its OLAP performance, we have introduced several new features to the storage layer of OceanBase Database, such as pushing filterers and operators down to the bottom. The features are controlled by the _pushdown_storage_level parameter. If you set the parameter to 3, you will enable the following features: fast scanning of static data, filter push-down, and aggregate operator push-down. The data will be calculated or filtered in advance in the storage layer, thus reducing the processing of invalid data and significantly improving the calculation efficiency.
The following table shows the results of tests performed in the same hardware environment. Specifically, each test cluster consists of three 32C128G ECS.g7.8xlarge nodes, and three 500 GB ESSD disks are mounted to each node. The size of the test dataset is 100 GB.
We executed the 22 SQL queries in sequence on the 100 GB dataset of the TPC-H test in OceanBase Database Community Edition V4.0 and V3.1 and compared the test results. The overall performance of V4.0 is 5 times better than that of V3.1.
We compared the performance of OceanBase Database Community Edition V4.0 and Greenplum 6.22.1 by running a TPC-H test. The results showed that OceanBase Database Community Edition V4.0 performed 5 to 6 times, or even 9 times in some cases, better than Greenplum 6.22.1 did in handling OLAP tasks with the same hardware configurations.
Figure 2 Results of a TPC-H 100GB test (time for each query)
Figure 3 Results of a TPC-H 100GB test (total time)
The 22 standard SQL queries of TPC-H involve operations that are frequently performed for data analysis, such as aggregation analysis, join query, expression calculation, and subquery, which can systematically test the database performance in many aspects, such as concurrent query, table scanning, and filter push-down. The significant reduction of RT in the TPC-H test also reflects the continuous improvement of the OceanBase Database in OLAP.
Since we started the project of OceanBase Database in 2010, we have firmly believed in the natively distributed architecture and have been dedicated to developing the database into a better natively distributed database. We want to provide users with cost-effective HTAP capabilities based on one set of data in the same system. Since it topped the TPC-C ranking, OceanBase Database has been steadily improving its OLAP capabilities. OceanBase Database V4.0 has made a great leap in OLAP compared with V3.x versions and dedicated OLAP databases. We really hope to help enterprises solve issues due to the traditional extract-transform-load (ETL) model, such as data synchronization latency and business complexity, and empower users to handle OLTP tasks while tackling OLAP tasks, such as report analysis and business decision-making, in a production environment. We want to make the database service and business applications more user-friendly.