This topic describes the software requirements, tenant configurations, and test procedure for running the TPC-H benchmark test with OceanBase Database.
What is the TPC-H benchmark test
TPC-H (Transaction Processing Performance Council Benchmark H) is a benchmark suite developed by the Transaction Processing Performance Council (TPC) to simulate decision-support applications. Currently, TPC-H is widely used in both academia and industry to evaluate the performance of decision-support technologies. This commercial benchmark provides a comprehensive evaluation of a system's overall business computing capabilities. It imposes higher requirements on vendors and holds significant practical value in business contexts. TPC-H is widely applied in areas such as bank credit analysis, credit card analysis, telecom operations analysis, tax analysis, and decision-making analysis in the tobacco industry.
The TPC-H benchmark evolved from TPC-D, a standard developed by TPC in 1994 for testing decision support systems. TPC-H implements a data warehouse in the Third Normal Form (3NF) with eight basic relationships. Its primary metric is the query response time, which measures the duration from query submission to result return. The unit of measurement is queries per hour (QphH@size), where H indicates the average number of complex queries executed per hour, and size indicates the database scale. This metric reflects the query processing capacity of a database system. The TPC-H benchmark is modeled based on actual production and operation environments. This enables it to evaluate critical performance metrics that other tests may not. Overall, the TPC-H benchmark fills the gap in data warehouse testing and encourages database vendors and research institutions to push the decision support technology to its limit.
Note
To enhance user experience and usability, ensuring that every developer can achieve better performance when using the database, OceanBase Database has conducted extensive optimization since V4.0.0. This performance testing method is based solely on basic parameter tuning, allowing developers to achieve a better database performance experience.
Environment setup
Prepare the test environment by following these steps:
Note
This example applies to a MySQL-compatible tenant.
Software requirements
JDK: We recommend that you use JDK 1.8u131 or later.
make: Run the command
yum install maketo install it.GCC: Run the command
yum install gccto install it.mysql-devel: Run the command
yum install mysql-develto install it.Python driver for connecting to a database: Run the command
sudo yum install MySQL-pythonto install it.prettytable: Run the command
pip install prettytableto install it.JDBC: We recommend that you use
mysql-connector-java-5.1.47.TPC-H tools: Download it from here. If you use obd to perform the one-click test, you can skip this tool.
OBClient: For more information, see OBClient documentation.
OceanBase Database: For more information, see Quick start with OceanBase Database.
IOPS: It is recommended that disk IOPS be above 10,000.
Tenant specifications
The tenant specifications are based on the hardware configurations in the TPC-H benchmark test report of OceanBase Database. You need to adjust the specifications based on the hardware configurations of your database.
Cluster deployment
This test requires four servers. The TPC-H and obd components are deployed on one server as the client pressure machine. OceanBase Database is deployed on three servers in a 1:1:1 configuration using obd.
Note
- In the TPC-H test, a server deployed with TPC-H and obd only needs four cores and 16 GB of memory.
- When you deploy a cluster, we recommend that you do not use the
obd cluster autodeploycommand. This command ensures stability by not maximizing resource utilization (for example, it does not use all memory). We recommend that you optimize the configuration file to maximize resource utilization.
After the deployment is successful, create a tenant and a user for running the TPC-H test (the sys tenant is an internal system tenant for cluster management. Do not use it for testing). Set
primary_zoneof the tenant toRANDOM. TheRANDOMvalue means that the leader of a newly created table partition is randomly assigned to one of the three servers.
Tenant creation
You can use the
OBD CLUSTER TENANT CREATEcommand to create a tenant. The corresponding SQL syntax is as follows:obd cluster tenant create <DEPLOY_NAME> -n <TENANT_NAME> --max-cpu=28 --memory-size=180G -–zone-list=zone1,zone2,zone3 -–primary-zone=RANDOM --locality=F@zone1,F@zone2,F@zone3 --charset=utf8 -s 'ob_tcp_invited_nodes="%"' --optimize=<optimize>The parameters are described as follows:
DEPLOY_NAME: the cluster name.TENANT_NAME: the tenant name.--zone-list: the zone list of the tenant.--primary-zone: the primary zone of the tenant.--locality: the distribution of replicas across zones.--charset: the character set of the tenant.-s: the tenant's system variable values.OPTIMIZE: the tenant load type. Valid values includeexpress_oltp,complex_oltp,olap,htap, andkv. The default load type ishtap, which is suitable for mixed OLAP and OLTP workloads. For more information about obd deployment, see obd cluster tenant create.Notice
For V4.3.x and later, when you use obd to deploy a cluster, you can set the
scenarioparameter to select an appropriate load type for the cluster. The default value of thescenarioparameter ishtap. For more information, see Deploy OceanBase Database using obd.
For example, you can create a tenant named
tpcc_tenantin theobperfcluster. The tenant has 28 CPU cores and 180 GB of memory. The load type of the tenant is the same as that of the cluster.obd cluster tenant create obperf -n tpcc_tenant --max-cpu=28 --memory-size=180G -–zone-list=zone1,zone2,zone3 -–primary-zone=RANDOM --locality=F@zone1,F@zone2,F@zone3 --charset=utf8 -s 'ob_tcp_invited_nodes="%"' --optimize=htapNote
In this example,
--optimize=htapis the default load type. In a production environment, select an appropriate load type based on the actual cluster type.
Test methods
Once the test environment is ready, you can perform the TPC-H performance test using the following two methods:
Perform TPC-H tests with obd in one click
Manually perform TPC-H tests using the TPC-H tool
Perform TPC-H Tests with obd in one click
You can perform TPC-H tests in one click using an obd script. The script command is as follows:
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/oceanbase/OceanBase.repo
sudo yum install obtpch
sudo ln -s /usr/tpc-h-tools/tpc-h-tools/ /usr/local/
obd test tpch obperf --tenant=tpch_mysql -s 100 --remote-tbl-dir=/tmp/tpch100
Before conducting the TPC-H test using the script mentioned above, note the following:
For detailed parameter configurations when running TPC-H with obd, see obd test tpch.
In this example, most parameters use their default values. In user scenarios, you can adjust the parameters based on your specific needs. For example, the cluster name used in this example is
obperf, and the tenant name istpch_mysql.When performing one-click testing with obd, the cluster deployment must be installed and deployed using obd. Otherwise, the cluster information cannot be retrieved, which will result in the inability to perform performance tuning based on the cluster configuration.
If the system tenant's password has been changed through the terminal and is no longer the default empty value, you need to first reset the password to its default value in the terminal. Then, use the obd cluster edit-config command to set the system tenant's password in the configuration file. The configuration item is
# root_password: # root user password. After executing theobd cluster edit-configcommand, you also need to run theobd cluster reloadcommand to apply the changes.After running the
obd test tpchcommand, the system will list the execution steps and outputs in detail. The larger the data size, the longer the process will take.Ensure that the
remote-tbl-dirremote directory has sufficient capacity to store the TPC-H data. We recommend that you use a dedicated disk to store and load the test data.The
obd test tpchcommand will automatically complete all operations without requiring any additional actions. This includes generating test data, transferring it, optimizing OceanBase parameters, loading the data, and running the test. If an error occurs during any intermediate step, you can refer to obd test tpch to retry. For example, you can skip data generation and transfer and proceed directly to loading and testing.
Manually perform TPC-H tests using the TPC-H tool
After you select the load type and apply tenant optimization in a cluster, you can manually perform the TPC-H benchmark test to gain a deeper understanding of OceanBase Database, especially in optimizing parameter settings.
Step 1: Create a test tenant
Run the commands in the system tenant (sys tenant) to create a test tenant:
Note
The deployment mode of the OceanBase cluster environment for this test is 1:1:1.
Create a resource unit named
mysql_box.CREATE RESOURCE UNIT mysql_box MAX_CPU 28, MEMORY_SIZE '200G', MIN_IOPS 200000, MAX_IOPS 12800000, LOG_DISK_SIZE '300G';Create a resource pool named
mysql_pool.CREATE RESOURCE POOL mysql_pool UNIT = 'mysql_box', UNIT_NUM = 1, ZONE_LIST = ('z1','z2','z3');Create a MySQL-compatible tenant named
mysql_tenant.CREATE TENANT mysql_tenant RESOURCE_POOL_LIST = ('mysql_pool'), PRIMARY_ZONE = RANDOM, LOCALITY = 'F@z1,F@z2,F@z3' SET VARIABLES ob_compatibility_mode='mysql', ob_tcp_invited_nodes='%', secure_file_priv = "/";
Step 2: Optimize the environment
Optimize OceanBase Database.
Execute the following statements in the system tenant (
systenant) to configure the relevant parameters.ALTER SYSTEM FLUSH PLAN CACHE GLOBAL; ALTER SYSTEM SET enable_sql_audit = false; SELECT sleep(5); ALTER SYSTEM SET enable_perf_event = false; ALTER SYSTEM SET syslog_level = 'PERF'; ALTER SYSTEM SET enable_record_trace_log = false; ALTER SYSTEM SET data_storage_warning_tolerance_time = '300s'; ALTER SYSTEM SET _data_storage_io_timeout = '600s'; ALTER SYSTEM SET trace_log_slow_query_watermark = '7d'; ALTER SYSTEM SET large_query_threshold = '0ms'; ALTER SYSTEM SET enable_syslog_recycle = 1; ALTER SYSTEM SET max_syslog_file_count = 300;Optimize the tenant.
Execute the following statements in the test tenant (user tenant) to configure the relevant parameters.
SET GLOBAL NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS'; SET GLOBAL NLS_TIMESTAMP_FORMAT = 'YYYY-MM-DD HH24:MI:SS.FF'; SET GLOBAL NLS_TIMESTAMP_TZ_FORMAT = 'YYYY-MM-DD HH24:MI:SS.FF TZR TZD'; SET GLOBAL ob_query_timeout = 10800000000; SET GLOBAL ob_trx_timeout = 10000000000; SET GLOBAL ob_sql_work_area_percentage = 50; ALTER SYSTEM SET default_table_store_format = 'column' ; ALTER SYSTEM SET ob_enable_batched_multi_statement = 'true'; ALTER SYSTEM SET _io_read_batch_size = '128k'; ALTER SYSTEM SET _io_read_redundant_limit_percentage = 50; SET GLOBAL parallel_degree_policy = AUTO; SET GLOBAL parallel_servers_target = 10000; SET GLOBAL collation_connection = utf8mb4_bin; SET GLOBAL collation_database = utf8mb4_bin; SET GLOBAL collation_server = utf8mb4_bin; SET GLOBAL autocommit = 1; ALTER SYSTEM SET ob_enable_batched_multi_statement = 'true';
Step 3: Install the TPC-H tools
Download the TPC-H tools from TPC-H Tools Download.
Unzip the package and go to the TPC-H directory.
[wieck@localhost ~] $ unzip 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip [wieck@localhost ~] $ cd TPC-H_Tools_v3.0.0Copy the
Makefile.suitefile.[wieck@localhost TPC-H_Tools_v3.0.0] $ cd dbgen/ [wieck@localhost dbgen] $ cp Makefile.suite MakefileRedefine the
CC,DATABASE,MACHINE, andWORKLOADparameters in theMakefilefile.[wieck@localhost dbgen] $ vim MakefileCC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCHModify the
tpcd.hfile and add new macro definitions.[wieck@localhost dbgen] $ vim tpcd.h#ifdef MYSQL #define GEN_QUERY_PLAN "" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endifCompile the files.
makeThe return result is as follows:
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lmThe
dbgenfile is generated for data generation, and theqgenanddists.dssfiles are generated for SQL statement generation.
Step 4: Generate data
You can generate TPC-H data of 10 GB, 100 GB, or 1 TB based on your actual environment. This section uses the example of generating 100 GB of data.
./dbgen -s 100
mkdir tpch100
mv *.tbl tpch100
To generate 1 TB of data using multiple threads, OceanBase Database supports direct load, allowing data from multiple files to be imported into tables simultaneously:
#!/bin/bash
SCALE_FACTOR=1000
CHUNK_COUNT=20
for ((i=1; i<=CHUNK_COUNT; i++))
do
CMD="./dbgen -s ${SCALE_FACTOR} -C ${CHUNK_COUNT} -S ${i} -vf"
$CMD &
done
wait
echo "All data generation tasks completed."
Step 5: Generate SQL queries
Note
You can follow the steps in this section to generate an SQL query statement and then adjust it, or you can use the SQL query statement provided in GitHub. If you choose to use the SQL query statement from GitHub, change the value of the cpu_num parameter to the actual number of concurrent threads.
Generate the SQL query statements using the TPC-H tools:
Copy the
dbgen/qgenanddbgen/dists.dssfiles to themysql_sqlfolder.In the
mysql_sqlfolder, create agen.shscript to generate SQL query statements.vim gen.sh#!/usr/bin/bash for i in {1..22} do ./qgen -d $i -s 100 > db"$i".sql doneModify the SQL query statements based on the actual number of concurrent threads.
You can execute the following command under the
systenant to view the total number of available CPU cores for a tenant.select sum(max_cpu) from DBA_OB_UNITS;Take
Q1as an example. The modified SQL statement is as follows:SELECT /*+ parallel(96) */ ---Add parallel execution l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date '1998-12-01' - interval '90' day GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
Step 6: Create tables
Create the schema file
create_tpch_mysql_table_part.ddlfor 100 GB of data.drop tablegroup IF EXISTS tpch_tg_SF_TPC_USER_lineitem_order_group; drop tablegroup IF EXISTS tpch_tg_SF_TPC_USER_partsupp_part; create tablegroup tpch_tg_SF_TPC_USER_lineitem_order_group binding true partition by key 1 partitions 256; create tablegroup tpch_tg_SF_TPC_USER_partsupp_part binding true partition by key 1 partitions 256; DROP TABLE IF EXISTS LINEITEM; CREATE TABLE lineitem ( l_orderkey int(11) NOT NULL, l_partkey int(11) NOT NULL, l_suppkey int(11) NOT NULL, l_linenumber int(11) NOT NULL, l_quantity decimal(15,2) NOT NULL, l_extendedprice decimal(15,2) NOT NULL, l_discount decimal(15,2) NOT NULL, l_tax decimal(15,2) NOT NULL, l_returnflag char(1) DEFAULT NULL, l_linestatus char(1) DEFAULT NULL, l_shipdate date NOT NULL, l_commitdate date DEFAULT NULL, l_receiptdate date DEFAULT NULL, l_shipinstruct varchar(25) DEFAULT NULL, l_shipmode varchar(10) DEFAULT NULL, l_comment varchar(44) DEFAULT NULL, primary key(l_shipdate, l_orderkey, l_linenumber) )row_format = condensed tablegroup = tpch_tg_SF_TPC_USER_lineitem_order_group partition by key (l_orderkey) partitions 256 with column group(each column); alter table lineitem CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS ORDERS; CREATE TABLE orders ( o_orderkey int(11) NOT NULL, o_custkey int(11) NOT NULL, o_orderstatus varchar(1) DEFAULT NULL, o_totalprice decimal(15,2) DEFAULT NULL, o_orderdate date NOT NULL, o_orderpriority varchar(15) DEFAULT NULL, o_clerk varchar(15) DEFAULT NULL, o_shippriority int(11) DEFAULT NULL, o_comment varchar(79) DEFAULT NULL, PRIMARY KEY (o_orderkey, o_orderdate) ) row_format = condensed tablegroup = tpch_tg_SF_TPC_USER_lineitem_order_group partition by key(o_orderkey) partitions 256 with column group(each column); alter table orders CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS PARTSUPP; CREATE TABLE partsupp ( ps_partkey int(11) NOT NULL, ps_suppkey int(11) NOT NULL, ps_availqty int(11) DEFAULT NULL, ps_supplycost decimal(15,2) DEFAULT NULL, ps_comment varchar(199) DEFAULT NULL, PRIMARY KEY (ps_partkey, ps_suppkey)) row_format = condensed tablegroup tpch_tg_SF_TPC_USER_partsupp_part partition by key(ps_partkey) partitions 256 with column group(each column); alter table partsupp CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS PART; CREATE TABLE part ( p_partkey int(11) NOT NULL, p_name varchar(55) DEFAULT NULL, p_mfgr varchar(25) DEFAULT NULL, p_brand varchar(10) DEFAULT NULL, p_type varchar(25) DEFAULT NULL, p_size int(11) DEFAULT NULL, p_container varchar(10) DEFAULT NULL, p_retailprice decimal(12,2) DEFAULT NULL, p_comment varchar(23) DEFAULT NULL, PRIMARY KEY (p_partkey)) row_format = condensed tablegroup tpch_tg_SF_TPC_USER_partsupp_part partition by key(p_partkey) partitions 256 with column group(each column); alter table part CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS CUSTOMER; CREATE TABLE customer ( c_custkey int(11) NOT NULL, c_name varchar(25) DEFAULT NULL, c_address varchar(40) DEFAULT NULL, c_nationkey int(11) DEFAULT NULL, c_phone varchar(15) DEFAULT NULL, c_acctbal decimal(15,2) DEFAULT NULL, c_mktsegment char(10) DEFAULT NULL, c_comment varchar(117) DEFAULT NULL, PRIMARY KEY (c_custkey)) row_format = condensed partition by key(c_custkey) partitions 256 with column group(each column); alter table customer CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS SUPPLIER; CREATE TABLE supplier ( s_suppkey int(11) NOT NULL, s_name varchar(25) DEFAULT NULL, s_address varchar(40) DEFAULT NULL, s_nationkey int(11) DEFAULT NULL, s_phone varchar(15) DEFAULT NULL, s_acctbal decimal(15,2) DEFAULT NULL, s_comment varchar(101) DEFAULT NULL, PRIMARY KEY (s_suppkey) ) row_format = condensed partition by key(s_suppkey) partitions 256 with column group(each column); alter table supplier CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS NATION; CREATE TABLE nation ( n_nationkey int(11) NOT NULL, n_name varchar(25) DEFAULT NULL, n_regionkey int(11) DEFAULT NULL, n_comment varchar(152) DEFAULT NULL, PRIMARY KEY (n_nationkey) ) row_format = condensed with column group(each column); alter table nation CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS REGION; CREATE TABLE region ( r_regionkey int(11) NOT NULL, r_name varchar(25) DEFAULT NULL, r_comment varchar(152) DEFAULT NULL, PRIMARY KEY (r_regionkey) ) row_format = condensed with column group(each column); alter table region CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; CREATE VIEW revenue0 AS SELECT l_suppkey as supplier_no, SUM(l_extendedprice * ( 1 - l_discount )) as total_revenue FROM lineitem WHERE l_shipdate >= DATE '1996-01-01' AND l_shipdate < DATE '1996-04-01' GROUP BY l_suppkey;Create the schema file
create_tpch_mysql_table_part_1000G.ddlfor 1 TB of data.drop tablegroup IF EXISTS tpch_tg_SF_TPC_USER_lineitem_order_group_1000; drop tablegroup IF EXISTS tpch_tg_SF_TPC_USER_partsupp_part_1000; create tablegroup tpch_tg_SF_TPC_USER_lineitem_order_group_1000 binding true partition by key 1 partitions 256; create tablegroup tpch_tg_SF_TPC_USER_partsupp_part_1000 binding true partition by key 1 partitions 256; DROP TABLE IF EXISTS LINEITEM; CREATE TABLE lineitem ( l_orderkey bigint NOT NULL, l_partkey int(32) NOT NULL, l_suppkey int(32) NOT NULL, l_linenumber int(32) NOT NULL, l_quantity decimal(32,2) NOT NULL, l_extendedprice decimal(32,2) NOT NULL, l_discount decimal(15,2) NOT NULL, l_tax decimal(15,2) NOT NULL, l_returnflag varchar(64) DEFAULT NULL, l_linestatus varchar(64) DEFAULT NULL, l_shipdate date NOT NULL, l_commitdate date DEFAULT NULL, l_receiptdate date DEFAULT NULL, l_shipinstruct varchar(64) DEFAULT NULL, l_shipmode varchar(64) DEFAULT NULL, l_comment varchar(64) DEFAULT NULL, primary key(l_shipdate, l_orderkey, l_linenumber) )row_format = condensed tablegroup = tpch_tg_SF_TPC_USER_lineitem_order_group_1000 partition by key (l_orderkey) partitions 256 with column group(each column); alter table lineitem CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS ORDERS; CREATE TABLE orders ( o_orderkey bigint NOT NULL, o_custkey int(32) NOT NULL, o_orderstatus varchar(64) DEFAULT NULL, o_totalprice decimal(15,2) DEFAULT NULL, o_orderdate date NOT NULL, o_orderpriority varchar(15) DEFAULT NULL, o_clerk varchar(15) DEFAULT NULL, o_shippriority int(32) DEFAULT NULL, o_comment varchar(128) DEFAULT NULL, PRIMARY KEY (o_orderkey, o_orderdate) ) row_format = condensed tablegroup = tpch_tg_SF_TPC_USER_lineitem_order_group_1000 partition by key(o_orderkey) partitions 256 with column group(each column); alter table orders CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS PARTSUPP; CREATE TABLE partsupp ( ps_partkey int(11) NOT NULL, ps_suppkey int(11) NOT NULL, ps_availqty int(11) DEFAULT NULL, ps_supplycost decimal(15,2) DEFAULT NULL, ps_comment varchar(199) DEFAULT NULL, PRIMARY KEY (ps_partkey, ps_suppkey)) row_format = condensed tablegroup tpch_tg_SF_TPC_USER_partsupp_part_1000 partition by key(ps_partkey) partitions 256 with column group(each column); alter table partsupp CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS PART; CREATE TABLE part ( p_partkey int(11) NOT NULL, p_name varchar(55) DEFAULT NULL, p_mfgr varchar(25) DEFAULT NULL, p_brand varchar(10) DEFAULT NULL, p_type varchar(25) DEFAULT NULL, p_size int(11) DEFAULT NULL, p_container varchar(10) DEFAULT NULL, p_retailprice decimal(12,2) DEFAULT NULL, p_comment varchar(23) DEFAULT NULL, PRIMARY KEY (p_partkey)) row_format = condensed tablegroup tpch_tg_SF_TPC_USER_partsupp_part_1000 partition by key(p_partkey) partitions 256 with column group(each column); alter table part CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS CUSTOMER; CREATE TABLE customer ( c_custkey int(11) NOT NULL, c_name varchar(25) DEFAULT NULL, c_address varchar(40) DEFAULT NULL, c_nationkey int(11) DEFAULT NULL, c_phone varchar(15) DEFAULT NULL, c_acctbal decimal(15,2) DEFAULT NULL, c_mktsegment char(10) DEFAULT NULL, c_comment varchar(117) DEFAULT NULL, PRIMARY KEY (c_custkey)) row_format = condensed partition by key(c_custkey) partitions 256 with column group(each column); alter table customer CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS SUPPLIER; CREATE TABLE supplier ( s_suppkey int(11) NOT NULL, s_name varchar(25) DEFAULT NULL, s_address varchar(40) DEFAULT NULL, s_nationkey int(11) DEFAULT NULL, s_phone varchar(15) DEFAULT NULL, s_acctbal decimal(15,2) DEFAULT NULL, s_comment varchar(101) DEFAULT NULL, PRIMARY KEY (s_suppkey) ) row_format = condensed partition by key(s_suppkey) partitions 256 with column group(each column); alter table supplier CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS NATION; CREATE TABLE nation ( n_nationkey int(11) NOT NULL, n_name varchar(25) DEFAULT NULL, n_regionkey int(11) DEFAULT NULL, n_comment varchar(152) DEFAULT NULL, PRIMARY KEY (n_nationkey) ) row_format = condensed with column group(each column); alter table nation CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS REGION; CREATE TABLE region ( r_regionkey int(11) NOT NULL, r_name varchar(25) DEFAULT NULL, r_comment varchar(152) DEFAULT NULL, PRIMARY KEY (r_regionkey) ) row_format = condensed with column group(each column); alter table region CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; CREATE VIEW revenue0 AS SELECT l_suppkey as supplier_no, SUM(l_extendedprice * ( 1 - l_discount )) as total_revenue FROM lineitem WHERE l_shipdate >= DATE '1996-01-01' AND l_shipdate < DATE '1996-04-01' GROUP BY l_suppkey;
Step 7: Load data
Write your own script based on the data and SQL query statements generated from the preceding steps. The following is an example of how to load data:
Create a script named
load_data.sh.#!/bin/bash host='$host_ip' # Attention!!! Please fill in the IP address of an observer node, such as observer A. It is recommended to store the data files on the same server. port='$host_port' # Port number of observer A user='$user' # Username tenant='$tenant_name' # Tenant name password='$password' # Password database='$db_name' # Database name data_path='$data_file' # Attention!!! Please fill in the path where the data files generated on observer A are stored. The data files are in the .tbl format. function load_data { remote_user="$user" Username of the data storage observer node table_name=${1} if [[ ${password} == "" ]];then obclient_conn="obclient -h${host} -P${port} -u${user} -D${database} -A -c" else obclient_conn="obclient -h${host} -P${port} -u${user} -D${database} -p${password} -A -c" fi table_list=$(ssh "${remote_user}@${host}" "ls ${data_path}/${table_name}.tbl* 2>/dev/null") echo "$table_list" IFS=$'\n' read -d '' -r -a table_files <<< "$table_list" table_files_comma_separated=$(IFS=,; echo "${table_files[*]}") echo "${table_files_comma_separated}" echo `date "+[%Y-%m-%d %H:%M:%S]"` "----------------------Importing data files of the ${table_name} table----------------------" # Import data in direct load mode. You can also modify the data import mode to other modes. echo "load data /*+ parallel(80) direct(true,0) */ infile '${table_files_comma_separated}' into table ${table_name} fields terminated by '|';" | ${obclient_conn} } starttime=`date +%s%N` for table in "nation" "region" "customer" "lineitem" "orders" "partsupp" "part" "supplier" do load_data "${table}" done end_time=`date +%s%N` totaltime=`echo ${end_time} ${starttime} | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` echo `date "+[%Y-%m-%d %H:%M:%S]"` "load data cost ${totaltime}s"After the data is loaded, perform a major compaction and statistics collection.
Perform a major compaction.
Execute the following statement in the test tenant to perform a major compaction.
ALTER SYSTEM MAJOR FREEZE;Check whether the major compaction is completed.
You can check the status of the major compaction in the
systenant.SELECT dt.TENANT_NAME, cc.FROZEN_SCN, cc.LAST_SCN FROM oceanbase.DBA_OB_TENANTS dt, oceanbase.CDB_OB_MAJOR_COMPACTION cc WHERE dt.TENANT_ID = cc.TENANT_ID AND dt.TENANT_NAME = 'mysql_tenant';Note
The major compaction is considered completed when the value of
FROZEN_SCNis equal to that ofLAST_SCNfor all tenants.Collect statistics.
Create a file named
analyze_table.sql.call dbms_stats.gather_table_stats(NULL, 'part', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'lineitem', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'customer', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'orders', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'partsupp', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'supplier', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128');Log in to the test tenant and execute the following statement to collect statistics:
source analyze_table.sql
Step 8: Run the test
Write your own script based on the data and SQL query statements generated from the preceding steps. The following is an example of how to run the test:
Write the
tpch.shscript.#!/bin/bash host='$host_ip' # Attention!!! Please fill in the IP address of observer A. port='$host_port' # Port number of observer A user='$user' # Username tenant='$tenant_name' # Tenant name password='$password' # Password database='$db_name' # Database name if [[ ${password} == "" ]];then TPCH_TEST="obclient -h${host} -P${port} -u${user}@{$tenant} -D${database} -A -c" else TPCH_TEST="obclient -h${host} -P${port} -p${password} -u${user}@{$tenant} -D${database} -A -c" fi function clear_kvcache { if [[ ${password_sys} == "" ]];then obclient_sys="obclient -h${host} -P${port} -uroot@sys -Doceanbase -A -c" else obclient_sys="obclient -h${host} -P${port} -uroot@sys -Doceanbase -p${password_sys} -A -c" fi tenant_name=${user#*@} echo "alter system flush kvcache ;" | ${obclient_sys} echo "alter system flush kvcache tenant '${tenant_name}' cache 'user_row_cache';" | ${obclient_sys} sleep 3s } function do_explain { #Execution plans echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN EXPLAIN ALL TPCH PLAN" for i in {1..22} do sql_explain="source explain_mysql/${i}.sql" echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN EXPLAIN Q${i}:" echo ${sql_explain} | ${TPCH_TEST} | sed 's/\\n/\n/g' |tee explain_log/${i}.exp echo `date '+[%Y-%m-%d %H:%M:%S]'` "Q${i} END" done } function do_warmup { #Run the warmup script totaltime=0 for i in {1..22} do starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN prewarm Q${i}" sql1="source mysql_sql/${i}.sql" echo ${sql1}| ${TPCH_TEST} > mysql_log/${i}_prewarm.log || ret=1 stoptime=`date +%s%N` costtime=`echo ${stoptime} ${starttime} | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` first_array[$i]=$(echo "scale=2; ${first_array[$i]} + $costtime" | bc) echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" totaltime=`echo ${totaltime} ${costtime} | awk '{printf "%0.2f\n", ($1 + $2)}'` done echo "total cost:${totaltime}s" } function hot_run { #Run the formal script for j in {1..10} do totaltime=0 for i in {1..22} do starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN BEST Q${i} (hot run)" sql1="source mysql_sql/${i}.sql" echo ${sql1}| ${TPCH_TEST} > mysql_log/${i}.log || ret=1 stoptime=`date +%s%N` costtime=`echo ${stoptime} ${starttime} | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` hot_array[$i]=$(echo "scale=2; ${hot_array[$i]} + $costtime" | bc) echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" totaltime=`echo ${totaltime} ${costtime} | awk '{printf "%0.2f\n", ($1 + $2)}'` done echo "total cost:${totaltime}s" done } function cold_run { #Run the formal script for j in {1..3} do totaltime=0 for i in {1..22} do clear_kvcache starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN BEST Q${i} (cold run)" sql1="source mysql_sql/${i}.sql" echo $sql1| $TPCH_TEST > mysql_log/${i}_cold.log || ret=1 stoptime=`date +%s%N` costtime=`echo $stoptime $starttime | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` cold_array[$i]=$(echo "scale=2; ${cold_array[$i]} + $costtime" | bc) echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" totaltime=`echo ${totaltime} ${costtime} | awk '{printf "%0.2f\n", ($1 + $2)}'` done echo "total cost:${totaltime}s" done } do_explain do_warmup hot_run cold_runRun the test script.
sh tpch.sh
FAQ
Q: Why did the data import fail with the following error?
ERROR 1017 (HY000) at line 1: File not existA: The
tblfiles must be placed in a directory on the machine where OceanBase Database is running, as data loading requires local import.Q: Why did I encounter an error when viewing the data?
ERROR 4624 (HY000): No memory or reach tenant memory limitA: This error occurs due to insufficient memory. It is recommended to increase the tenant's memory allocation.
Q: Why did the data import fail with the following error?
ERROR 1227 (42501) at line 1: Access deniedA: The user needs appropriate access permissions. Run the following command to grant the necessary permissions:
grant file on *.* to tpch_100g_part;