This topic describes how to use the official TPC-H tool to test the OLAP performance of OceanBase Cloud.
Note
OceanBase Database has been optimized in V4.0.0 and later to enhance user experience and usability, ensuring developers can achieve good performance when using the database. This performance test method is based on basic parameter tuning to provide developers with a good database performance experience, but it may not fully comply with the TPC-H standard.
What is TPC-H
TPC-H (Transaction Processing Performance Council Benchmark H) is a benchmark developed by the Transaction Processing Performance Council (TPC) to simulate decision support applications. It is widely used in both academia and industry to evaluate the performance of decision support technologies. This benchmark comprehensively assesses a system's overall business computing capabilities, setting higher standards for vendors and offering significant practical value in various industries such as banking, telecommunications, taxation, and tobacco. It is commonly applied in areas like credit analysis, credit card analysis, operational analysis, and decision-making processes.
TPC-H evolved from TPC-D, a standard benchmark introduced by TPC in 1994 for decision support systems. TPC-H implements a data warehouse using third normal form (3NF), consisting of eight basic relations. Its primary metric is the response time for each query, measured from query submission to result return. The benchmark's unit of measurement is queries per hour (QphH@size), where H represents the average number of complex queries executed per hour, and size indicates the database scale, reflecting the system's query processing capability. TPC-H is modeled after real-world production environments, allowing it to evaluate critical performance parameters that other benchmarks may not cover. Overall, TPC-H meets the testing needs of the data warehouse field and drives vendors and research institutions to push the limits of this technology.
Prepare the environment
Prepare a pressure test ECS
Prepare an ECS with the same specifications as the OBServer to avoid it becoming a bottleneck. Make sure that the pressure machine ECS is in the same VPC network as the OceanBase tenant and that the OceanBase cluster has allowed access from the pressure machine ECS.
Create an OceanBase instance
- Create an OceanBase instance. For more information, see Create an instance.
- Create a tenant and set the tenant specifications based on the instance resources. For more information, see Create a tenant.
- Create a database. In this example, a database named tpchtest is created. For more information, see Create a database.
Install TPC-H Tool: Download from Download 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip
Optimize the environment
Test scenario:
Before running the TPC-H test, you need to perform some basic settings on the OBServer to achieve the best performance.
Tenant-level parameter settings:
SET GLOBAL ob_sql_work_area_percentage = 80;
SET GLOBAL ob_query_timeout = 36000000000;
SET GLOBAL ob_trx_timeout = 36000000000;
SET GLOBAL max_allowed_packet = 67108864;
SET GLOBAL parallel_servers_target = 624;
Execute the test
Install TPC-H Tool
Download the TPC-H Tool. For more information, see TPC-H Tool Download Page.
After the download is complete, extract the files and navigate to the TPC-H directory.
[xxxxx@localhost ~] $ unzip 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip [xxxxx@localhost ~] $ cd 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-toolCopy the
makefile.suitefile.[xxxxx@localhost 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool] $ cd dbgen/ [xxxxx@localhost dbgen] $ cp makefile.suite MakefileModify the
CC,DATABASE,MACHINE, andWORKLOADparameters in theMakefilefile.[xxxxx@localhost dbgen] $ vim Makefile CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCHModify the
tpcd.hfile and add new macro definitions.[xxxxx@localhost dbgen] $ vim tpcd.h #ifdef MYSQL #define GEN_QUERY_PLAN "" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endifCompile the files.
make
Generate data
You can generate TCP-H data of 10G, 100G, or 1T based on your environment. This example generates 100G data.
./dbgen -s 100
mkdir tpch100
mv *.tbl tpch100
Generate SQL query
Notes:
You can generate the query SQL statements by following the steps in this section and then modify them, or you can directly use the query SQL statements provided in GitHub. If you choose to use the query SQL statements from GitHub, you need to change the value of cpu_num in the SQL statements to the actual number of concurrent threads.
Copy the
qgenanddists.dssfiles to thequeriesdirectory.cp qgen queries cp dists.dss queriesCreate a
gen.shscript in thequeriesdirectory to generate the query SQL statements.[xxxxx@localhost queries] $ vim gen.sh #!/usr/bin/bash for i in {1..22} do ./qgen -d $i -s 100 > db"$i".sql doneExecute the
gen.shscript.[xxxxx@localhost queries] $ chmod +x gen.sh [xxxxx@localhost queries] $ ./gen.shConvert the line endings of the query SQL statements to Unix format.
[xxxxx@localhost queries] $ dos2unix *For the converted query SQL statements, see GitHub. You need to change the value of
cpu_numin the SQL statements to the actual number of concurrent threads. We recommend that you set the number of concurrent threads to the total number of CPU cores of the tenant for the best performance.For example, the modified SQL statement for q1 is as follows:
SELECT /*+ parallel(96) */ --- increase parallel execution l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date '1998-12-01' - interval '90' day GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
Create a table
Create a table structure file named create_tpch_mysql_table_part.ddl.
DROP TABLE IF EXISTS lineitem;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS partsupp;
DROP TABLE IF EXISTS part;
DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS supplier;
DROP TABLE IF EXISTS nation;
DROP TABLE IF EXISTS region;
DROP TABLEGROUP IF EXISTS tpch_tg_lineitem_order_group;
DROP TABLEGROUP IF EXISTS tpch_tg_partsupp_part;
CREATE TABLEGROUP IF NOT EXISTS tpch_tg_lineitem_order_group binding true partition by key 1 partitions cpu_num;
CREATE TABLEGROUP IF NOT EXISTS tpch_tg_partsupp_part binding true partition by key 1 partitions cpu_num;
DROP TABLE IF EXISTS lineitem;
CREATE TABLE lineitem (
l_orderkey BIGINT NOT NULL,
l_partkey BIGINT NOT NULL,
l_suppkey INTEGER NOT NULL,
l_linenumber INTEGER NOT NULL,
l_quantity DECIMAL(15,2) NOT NULL,
l_extendedprice DECIMAL(15,2) NOT NULL,
l_discount DECIMAL(15,2) NOT NULL,
l_tax DECIMAL(15,2) NOT NULL,
l_returnflag char(1) DEFAULT NULL,
l_linestatus char(1) DEFAULT NULL,
l_shipdate date NOT NULL,
l_commitdate date DEFAULT NULL,
l_receiptdate date DEFAULT NULL,
l_shipinstruct char(25) DEFAULT NULL,
l_shipmode char(10) DEFAULT NULL,
l_comment varchar(44) DEFAULT NULL,
PRIMARY KEY(l_orderkey, l_linenumber))row_format = condensed
tablegroup = tpch_tg_lineitem_order_group
partition by key (l_orderkey) partitions cpu_num;
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
o_orderkey bigint not null,
o_custkey bigint not null,
o_orderstatus char(1) default null,
o_totalprice bigint default null,
o_orderdate date not null,
o_orderpriority char(15) default null,
o_clerk char(15) default null,
o_shippriority bigint default null,
o_comment varchar(79) default null,
PRIMARY KEY (o_orderkey))row_format = condensed
tablegroup = tpch_tg_lineitem_order_group
partition by key(o_orderkey) partitions cpu_num;
DROP TABLE IF EXISTS partsupp;
CREATE TABLE partsupp (
ps_partkey bigint not null,
ps_suppkey bigint not null,
ps_availqty bigint default null,
ps_supplycost bigint default null,
ps_comment varchar(199) default null,
PRIMARY KEY (ps_partkey, ps_suppkey))row_format = condensed
tablegroup tpch_tg_partsupp_part
partition by key(ps_partkey) partitions cpu_num;
DROP TABLE IF EXISTS part;
CREATE TABLE part (
p_partkey bigint not null,
p_name varchar(55) default null,
p_mfgr char(25) default null,
p_brand char(10) default null,
p_type varchar(25) default null,
p_size bigint default null,
p_container char(10) default null,
p_retailprice bigint default null,
p_comment varchar(23) default null,
PRIMARY KEY (p_partkey))row_format = condensed
tablegroup tpch_tg_partsupp_part
partition by key(p_partkey) partitions cpu_num;
DROP TABLE IF EXISTS customer;
CREATE TABLE customer (
c_custkey bigint not null,
c_name varchar(25) default null,
c_address varchar(40) default null,
c_nationkey bigint default null,
c_phone char(15) default null,
c_acctbal bigint default null,
c_mktsegment char(10) default null,
c_comment varchar(117) default null,
PRIMARY KEY (c_custkey))row_format = condensed
partition by key(c_custkey) partitions cpu_num;
DROP TABLE IF EXISTS supplier;
CREATE TABLE supplier (
s_suppkey bigint not null,
s_name char(25) default null,
s_address varchar(40) default null,
s_nationkey bigint default null,
s_phone char(15) default null,
s_acctbal bigint default null,
s_comment varchar(101) default null,
PRIMARY KEY (s_suppkey))row_format = condensed
partition by key(s_suppkey) partitions cpu_num;
DROP TABLE IF EXISTS nation;
CREATE TABLE nation (
n_nationkey bigint not null,
n_name char(25) default null,
n_regionkey bigint default null,
n_comment varchar(152) default null,
PRIMARY KEY (n_nationkey))row_format = condensed;
DROP TABLE IF EXISTS region;
CREATE TABLE region (
r_regionkey bigint not null,
r_name char(25) default null,
r_comment varchar(152) default null,
PRIMARY KEY (r_regionkey))row_format = condensed;
Load data
You can generate scripts based on the data and SQL statements generated in the preceding steps. The following example shows how to load data:
Create a directory for the loading script.
[xxxxx@localhost dbgen] $ mkdir load [xxxxx@localhost dbgen] $ cd load [xxxxx@localhost load] $ cp xx/create_tpch_mysql_table_part.ddl ./Create the
load.pyscript.[xxxxx@localhost load] $ vim load.py #!/usr/bin/env python #-*- encoding:utf-8 -*- import os import sys import time import commands hostname='$oceanbase_url' # OceanBase database connection address port='$host_port' # Port number user='$user' # Username password='$password' # Password data_path='$path' # Note: Please specify the directory where the tbl files are located on the pressure test ECS db_name='$db_name' # Database name # Create tables cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s < create_tpch_mysql_table_part.ddl'%(hostname,port,user,tenant,password,db_name) result = commands.getstatusoutput(cmd_str) print result cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s -e "show tables;" '%(hostname,port,user,tenant,password,db_name) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ local infile '%s/customer.tbl' into table customer fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ local infile '%s/lineitem.tbl' into table lineitem fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ local infile '%s/nation.tbl' into table nation fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ local infile '%s/orders.tbl' into table orders fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -D%s -e "load data /*+ parallel(80) */ local infile '%s/partsupp.tbl' into table partsupp fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ local infile '%s/part.tbl' into table part fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ local infile '%s/region.tbl' into table region fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ local infile '%s/supplier.tbl' into table supplier fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print resultLoad data.
Note
To load data, you must install the OBClient client.
python load.py (0,'') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.\nTABLE_NAME\nT1\nLINEITEM\nORDERS\nPARTSUPP\nPART\nCUSTOMER\nSUPPLIER\nNATION\nREGION') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')Execute a major compaction.
MySQL [oceanbase]> ALTER SYSTEM major freeze; Query OK, 0 rows affectedCheck whether the major compaction is completed.
MySQL [oceanbase]> select STATUS from oceanbase.DBA_OB_MAJOR_COMPACTION;If the value of the STATUS column changes from COMPACTING to IDLE, the major compaction is completed.
Manually collect statistics.
call dbms_stats.gather_schema_stats('$db_name',degree=>96);
Run the test
You can write a script based on the data and SQL statements generated in the preceding steps. Here is an example of how to run the test:
Write a test script named
tpch.shin thequeriesdirectory.[xxxxx@localhost queries] $ vim tpch.sh #!/bin/bash TPCH_TEST="obclient -h $host_ip -P $host_port -utpch_100g_part@tpch_mysql -D tpch_100g_part -ptest -c" # warmup preheating for i in {1..22} do sql1="source db${i}.sql" echo $sql1| $TPCH_TEST >db${i}.log || ret=1 done # Run the test for i in {1..22} do starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN Q${i}" sql1="source db${i}.sql" echo $sql1| $TPCH_TEST >db${i}.log || ret=1 stoptime=`date +%s%N` costtime=`echo $stoptime $starttime | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" doneRun the test script.
sh tpch.sh
