This topic describes how to run the TPC-H benchmark on OceanBase Database in analytical processing (AP) scenarios using the following two methods:
- One-click TPC-H test using obd
- Manual TPC-H test using the official TPC-H tool
Note
To enhance user experience and ensure optimal performance, OceanBase Database has been extensively optimized since V4.0.0. The performance testing methods outlined here focus on tuning basic parameters to provide developers with an improved database experience.
What is TPC-H?
TPC-H is a decision support benchmark developed by the Transaction Processing Performance Council (TPC). It is widely used in both academic and industrial sectors to evaluate the performance of decision support applications. This benchmark offers a comprehensive assessment of a system's business computing capabilities and holds significant commercial relevance, especially in areas like credit analysis, telecom operation analysis, tax analysis, and decision-making in the tobacco industry.
The TPC-H benchmark evolved from TPC-D, a standard developed by TPC in 1994 for testing decision support systems. TPC-H implements a data warehouse in the Third Normal Form (3NF) with eight basic relationships. Its primary metric is the query response time, which measures the duration from query submission to result return. The unit of measurement is queries per hour (QphH@size), where H indicates the average number of complex queries executed per hour, and size indicates the database scale. This metric reflects the query processing capacity of a database system. The TPC-H benchmark is modeled based on actual production and operation environments. This enables it to evaluate critical performance metrics that other tests may not. Overall, the TPC-H benchmark fills the gap in data warehouse testing and encourages database vendors and research institutions to push the decision support technology to its limit.
Environment preparation
Prepare your testing environment based on the following requirements:
Java Development Kit (JDK): Use V1.8u131 or later.
Make: Run the
yum install makecommand to install make.GCC: Run the
yum install gcccommand to install GCC.MySQL-devel: Run the
yum install mysql-develcommand to install MySQL-devel.MySQL Connector/Python: Run the
sudo yum install MySQL-pythoncommand to install MySQL Connector/Python.PrettyTable: Run the
pip install prettytablecommand to install PrettyTable.Java Database Connectivity (JDBC): Use mysql-connector-java-5.1.47.
TPC-H tools: Click here to download the tools. If you use obd to perform the test, there's no need to download these tools.
OBClient: For more information, see OBClient Documentation.
OceanBase Database: For more information, see Quick start with OceanBase Database Community Edition.
IOPS: We recommend that the disk IOPS be above 10,000.
Tenant specifications:
CREATE RESOURCE UNIT tpch_unit max_cpu 30, memory_size '180g'; CREATE RESOURCE POOL tpch_pool unit = 'tpch_unit', unit_num = 1, zone_list=('zone1','zone2','zone3'); CREATE TENANT tpch_mysql resource_pool_list=('tpch_pool'), zone_list('zone1', 'zone2', 'zone3'), primary_zone=RANDOM, locality='F@zone1,F@zone2,F@zone3' set variables ob_compatibility_mode='mysql', ob_tcp_invited_nodes='%',secure_file_priv = '';
Notice
- The tenant specifications mentioned above are based on the hardware configuration in the TPC-H benchmark report. You need to adjust the specifications based on the hardware configurations of your database.
- When deploying a cluster, it is recommended not to use the
obd cluster autodeploycommand. This command prioritizes stability and does not maximize resource utilization (for example, it does not use all available memory). It is advisable to optimize the configuration file separately to maximize resource utilization.
Cluster deployment
This test requires a total of four servers. Deploy TPC-H and obd on one server for stress testing, and set up the OceanBase cluster using obd on the remaining three servers in a 1:1:1 configuration.
Notice
In the TPC-H test, the server for deploying TPC-H and obd requires only four CPU cores and 16 GB of memory.
After a successful deployment, create the tenants and users needed for the TPC-H test. The
systenant is a built-in system tenant for cluster management and do not use it to perform the test. Set the tenant'sprimary_zonetoRANDOM, which indicates that the leader of new table partitions is randomly assigned to one of these three servers.
One-click TPC-H test using obd
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/oceanbase/OceanBase.repo
sudo yum install obtpch
sudo ln -s /usr/tpc-h-tools/tpc-h-tools/ /usr/local/
obd test tpch obperf --tenant=tpch_mysql -s 100 --remote-tbl-dir=/tmp/tpch100
Considerations
- When using obd to run the TPC-H benchmark, refer to obd test tpch for detailed parameters.
- In this example, most parameters are set to their default values. In actual scenarios, you can adjust these parameters based on specific needs. For example, the cluster name used here is
obperf, and the tenant name istpch_mysql. - To use obd for the test, make sure that the cluster is installed and deployed using obd. Otherwise, the cluster information cannot be obtained, and performance tuning cannot be performed based on the cluster configuration.
- If the system tenant's password is changed through terminal login and is no longer the default empty value, you need to reset it to the default through the terminal first. Then, use the obd cluster edit-config command to set the password in the configuration file for the system tenant, under the
# root_password: # root user passwordsection. After that, run theobd cluster reloadcommand to apply the changes. - After you run the
obd test tpchcommand, the system lists the test steps and outputs in detail. A larger data amount requires a longer test time. - Ensure that the remote directory
remote-tbl-dirhas enough capacity to store TPC-H data. It is recommended to use a separate disk for storing test data. - The
obd test tpchcommand automatically completes all operations, including data generation, data transmission, OceanBase parameter tuning, data loading, and testing, without requiring any additional interventions. If any errors occur during the process, refer to obd test tpch to try again. For example, you can skip data generation and transmission to proceed directly with data loading and testing.
Manual TPC-H test using the official TPC-H tool
Environment tuning
OceanBase Database tuning.
Run the
obclient -h$host_ip -P$host_port -uroot@sys -Acommand under the system tenant and then execute the following statements:ALTER SYSTEM flush plan cache GLOBAL; ALTER system SET enable_sql_audit=false; select sleep(5); ALTER system SET enable_perf_event=false; ALTER system SET syslog_level='PERF'; alter system set enable_record_trace_log=false;Tenant tuning.
Run the
obclient -h$host_ip -P$host_port -u$user@$tenant -p$password -Acommand in the test user and execute the following statements:SET GLOBAL ob_sql_work_area_percentage = 80; SET GLOBAL ob_query_timeout = 36000000000; SET GLOBAL ob_trx_timeout = 36000000000; SET GLOBAL max_allowed_packet = 67108864; # parallel_servers_target = max_cpu * server_num * 8 SET GLOBAL parallel_servers_target = 624; set global ob_query_timeout=14400000000; set global ob_trx_timeout=10000000000; set global autocommit=1; set global optimizer_use_sql_plan_baselines = true; set global optimizer_capture_sql_plan_baselines = true; alter system set ob_enable_batched_multi_statement='true';
Set the default table format to column for a user tenant
Run the obclient -h$host_ip -P$host_port -u$user@$tenant -p$password -A command as the test user. Then execute the following statement:
alter system set default_table_store_format = 'column';
Install the TPC-H tool
Download the TPC-H tool.
For more information, visit the TPC-H Tools Download page.
Unzip the package and enter the TPC-H directory.
[wieck@localhost ~] $ unzip 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip [wieck@localhost ~] $ cd TPC-H_Tools_v3.0.0Copy the
makefile.suitefile.[wieck@localhost TPC-H_Tools_v3.0.0] $ cd dbgen/ [wieck@localhost dbgen] $ cp makefile.suite MakefileConfigure the parameters
CC,DATABASE,MACHINE, andWORKLOADin theMakefilefile.[wieck@localhost dbgen] $ vim Makefile CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCHAdd new macro definitions to the
tpcd.hfile.[wieck@localhost dbgen] $ vim tpcd.h #ifdef MYSQL #define GEN_QUERY_PLAN "" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endifCompile the file.
make
Generate data
You can generate 10 GB, 100 GB, or 1 TB of data as needed for the TPC-H test. The example below shows how to generate 100 GB of data.
./dbgen -s 100
mkdir tpch100
mv *.tbl tpch100
Generate SQL queries
Notice
You can follow the steps below to generate and then customize the SQL queries, or you can directly use the ones provided on our GitHub. If you choose the latter, remember to replace cpu_num in the SQL statements with your actual concurrency level.
Copy the
qgenanddists.dssfiles into thequeriesdirectory.[wieck@localhost dbgen] $ cp qgen queries [wieck@localhost dbgen] $ cp dists.dss queriesIn the
queriesdirectory, create agen.shscript to generate SQL queries.[wieck@localhost dbgen] $ cd queries [wieck@localhost queries] $ vim gen.sh #!/usr/bin/bash for i in {1..22} do ./qgen -d $i -s 100 > db"$i".sql doneExecute the
gen.shscript.[wieck@localhost queries] $ chmod +x gen.sh [wieck@localhost queries] $ ./gen.shModify the SQL queries.
[wieck@localhost queries] $ dos2unix *
For more information about the modified SQL queries, see GitHub. You need to change the value of cpu_num in the SQL statements to the actual concurrency. To ensure the best performance, we recommend that you set the value to the total number of available CPU cores.
To view the total number of available CPU cores, run the following command under the system tenant:
``sql obclient> SELECT sum(cpu_capacity_max) FROM oceanbase.__all_virtual_server;
Here is a sample SQL query `q1` after modification:
```sql
SELECT /*+ parallel(96) */ ---Add parallel execution
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '90' day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
Create tables
Create a schema file named create_tpch_mysql_table_part.ddl.
create tablegroup if not exists tpch_tg_SF_TPC_USER_lineitem_order_group binding true partition by key 1 partitions 96;
create tablegroup if not exists tpch_tg_SF_TPC_USER_partsupp_part binding true partition by key 1 partitions 96;
drop table if exists LINEITEM;
CREATE TABLE lineitem (
l_orderkey BIGINT NOT NULL,
l_partkey BIGINT NOT NULL,
l_suppkey INTEGER NOT NULL,
l_linenumber INTEGER NOT NULL,
l_quantity NUMBER(12,2) NOT NULL,
l_extendedprice NUMBER(12,2) NOT NULL,
l_discount NUMBER(12,2) NOT NULL,
l_tax NUMBER(12,2) NOT NULL,
l_returnflag char(1) DEFAULT NULL,
l_linestatus char(1) DEFAULT NULL,
l_shipdate date NOT NULL,
l_commitdate date DEFAULT NULL,
l_receiptdate date DEFAULT NULL,
l_shipinstruct char(25) DEFAULT NULL,
l_shipmode char(10) DEFAULT NULL,
l_comment varchar(44) DEFAULT NULL,
primary key(l_orderkey, l_linenumber)
)row_format = condensed
tablegroup = tpch_tg_SF_TPC_USER_lineitem_order_group
partition by key (l_orderkey) partitions 96;
drop table if exists orders;
CREATE TABLE orders (
o_orderkey BIGINT NOT NULL,
o_custkey BIGINT NOT NULL,
o_orderstatus char(1) DEFAULT NULL,
o_totalprice NUMBER(12,2) DEFAULT NULL,
o_orderdate date NOT NULL,
o_orderpriority char(15) DEFAULT NULL,
o_clerk char(15) DEFAULT NULL,
o_shippriority INTEGER DEFAULT NULL,
o_comment varchar(79) DEFAULT NULL,
PRIMARY KEY (o_orderkey)
)row_format = condensed
tablegroup = tpch_tg_SF_TPC_USER_lineitem_order_group
partition by key(o_orderkey) partitions 96;
drop table if exists partsupp;
CREATE TABLE partsupp (
ps_partkey BIGINT NOT NULL,
ps_suppkey INTEGER NOT NULL,
ps_availqty INTEGER DEFAULT NULL,
ps_supplycost NUMBER(12,2) DEFAULT NULL,
ps_comment varchar(199) DEFAULT NULL,
PRIMARY KEY (ps_partkey, ps_suppkey)) row_format = condensed
tablegroup tpch_tg_SF_TPC_USER_partsupp_part
partition by key(ps_partkey) partitions 96;
drop table if exists PART;
CREATE TABLE part (
p_partkey BIGINT NOT NULL,
p_name varchar(55) DEFAULT NULL,
p_mfgr char(25) DEFAULT NULL,
p_brand char(10) DEFAULT NULL,
p_type varchar(25) DEFAULT NULL,
p_size INTEGER DEFAULT NULL,
p_container char(10) DEFAULT NULL,
p_retailprice NUMBER(12,2) DEFAULT NULL,
p_comment varchar(23) DEFAULT NULL,
PRIMARY KEY (p_partkey)) row_format = condensed
tablegroup tpch_tg_SF_TPC_USER_partsupp_part
partition by key(p_partkey) partitions 96;
drop table if exists CUSTOMER;
CREATE TABLE customer (
c_custkey BIGINT NOT NULL,
c_name varchar(25) DEFAULT NULL,
c_address varchar(40) DEFAULT NULL,
c_nationkey INTEGER DEFAULT NULL,
c_phone char(15) DEFAULT NULL,
c_acctbal NUMBER(12,2) DEFAULT NULL,
c_mktsegment char(10) DEFAULT NULL,
c_comment varchar(117) DEFAULT NULL,
PRIMARY KEY (c_custkey)) row_format = condensed
partition by key(c_custkey) partitions 96;
drop table if exists SUPPLIER;
CREATE TABLE supplier (
s_suppkey INTEGER NOT NULL,
s_name char(25) DEFAULT NULL,
s_address varchar(40) DEFAULT NULL,
s_nationkey INTEGER DEFAULT NULL,
s_phone char(15) DEFAULT NULL,
s_acctbal NUMBER(12,2) DEFAULT NULL,
s_comment varchar(101) DEFAULT NULL,
PRIMARY KEY (s_suppkey)
) row_format = condensed partition by key(s_suppkey) partitions 96;
drop table if exists NATION;
CREATE TABLE nation (
n_nationkey INTEGER NOT NULL,
n_name char(25) DEFAULT NULL,
n_regionkey INTEGER DEFAULT NULL,
n_comment varchar(152) DEFAULT NULL,
PRIMARY KEY (n_nationkey)
) row_format = condensed;
drop table if exists REGION;
CREATE TABLE region (
r_regionkey INTEGER NOT NULL,
r_name char(25) DEFAULT NULL,
r_comment varchar(152) DEFAULT NULL,
PRIMARY KEY (r_regionkey)
) row_format = condensed;
Load data
Write scripts based on the generated data and SQL queries. Here is an example of how to load the data:
Create a directory for the loading script.
[wieck@localhost dbgen] $ mkdir load [wieck@localhost dbgen] $ cd load [wieck@localhost load] $ cp xx/create_tpch_mysql_table_part.ddl ./Create the
load.pyscript.[wieck@localhost load] $ vim load.py #!/usr/bin/env python #-*- encoding:utf-8 -*- import os import sys import time import commands hostname='$host_ip' # Notice! Please fill in the IP address of a specific OBServer node, such as observer A's IP address. port='$host_port' # The port number of observer A. tenant='$tenant_name' # The tenant name. user='$user' # The username. password='$password' # The password. data_path='$path' # Notice! Please fill in the directory of tbl on observer A. db_name='$db_name' # The database name. # Create a table. cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s < create_tpch_mysql_table_part.ddl'%(hostname,port,user,tenant,password,db_name) result = commands.getstatusoutput(cmd_str) print result cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s -e "show tables;" '%(hostname,port,user,tenant,password,db_name) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/customer.tbl' into table customer fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/lineitem.tbl' into table lineitem fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/nation.tbl' into table nation fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/orders.tbl' into table orders fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -D%s -e "load data /*+ parallel(80) */ infile '%s/partsupp.tbl' into table partsupp fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/part.tbl' into table part fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/region.tbl' into table region fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/supplier.tbl' into table supplier fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print resultLoad the data.
Note
To load data, you need to install the OBClient.
[wieck@localhost load] $ python load.pyPerform a major compaction.
Log in to the test tenant and perform a major compaction.
obclient > ALTER SYSTEM MAJOR FREEZE;Check whether the major compaction is complete.
SELECT * FROM oceanbase.DBA_OB_ZONE_MAJOR_COMPACTION\GThe return result is as follows:
*************************** 1. row *************************** ZONE: zone1 BROADCAST_SCN: 1716172011046213913 LAST_SCN: 1716172011046213913 LAST_FINISH_TIME: 2024-05-20 10:35:07.829496 START_TIME: 2024-05-20 10:26:51.579881 STATUS: IDLE 1 row in setIf the value of
STATUSisIDLEand the values ofBROADCAST_SCNandLAST_SCNare equal, the major compaction is complete.Manually collect statistics.
Run the
obclient -h$host_ip -P$host_port -u$user@$tenant -p$password -A -D$databasecommand as the test user. Then, run the following command:call dbms_stats.gather_schema_stats('$db_name',degree=>96);
Perform the test
Write scripts based on the generated data and SQL queries. Here is an example of how to perform the test:
Write the test script
tpch.shin thequeriesdirectory.[wieck@localhost queries] $ vim tpch.sh #!/bin/bash TPCH_TEST="obclient -h $host_ip -P $host_port -utpch_100g_part@tpch_mysql -D tpch_100g_part -ptest -c" # Warm-up phase for i in {1..22} do sql1="source db${i}.sql" echo $sql1| $TPCH_TEST >db${i}.log || ret=1 done # Execution phase for i in {1..22} do starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN Q${i}" sql1="source db${i}.sql" echo $sql1| $TPCH_TEST >db${i}.log || ret=1 stoptime=`date +%s%N` costtime=`echo $stoptime $starttime | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" doneExecute the test script.
sh tpch.sh