This topic introduces how to run the TPC-H benchmark on OceanBase Database using the following two methods:
One-click TPC-H test using obd.
Manual TPC-H test using the official TPC-H tool.
Note
To enhance user experience and ensure optimal performance, OceanBase Database has been extensively optimized since V4.0.0. The performance testing methods outlined here focus on tuning basic parameters to provide developers with an improved database experience.
What is TPC-H?
TPC-H is a decision support benchmark developed by the Transaction Processing Performance Council (TPC). It is widely used in both academic and industrial sectors to evaluate the performance of decision support applications. This benchmark offers a comprehensive assessment of a system's business computing capabilities and holds significant commercial relevance, especially in areas like credit analysis, telecom operation analysis, tax analysis, and decision-making in the tobacco industry.
The TPC-H benchmark evolved from TPC-D, a standard developed by TPC in 1994 for testing decision support systems. TPC-H implements a data warehouse in the Third Normal Form (3NF) with eight basic relationships. Its primary metric is the query response time, which measures the duration from query submission to result return. The unit of measurement is queries per hour (QphH@size), where H indicates the average number of complex queries executed per hour, and size indicates the database scale. This metric reflects the query processing capacity of a database system. The TPC-H benchmark is modeled based on actual production and operation environments. This enables it to evaluate critical performance metrics that other tests may not. Overall, the TPC-H benchmark fills the gap in data warehouse testing and encourages database vendors and research institutions to push the decision support technology to its limit.
Environment preparation
Prepare your testing environment based on the following requirements:
Java Development Kit (JDK): Use V1.8u131 or later.
Make: Run the
yum install makecommand to install make.GCC: Run the
yum install gcccommand to install GCC.MySQL-devel: Run the
yum install mysql-develcommand to install MySQL-devel.MySQL Connector/Python: Run the
sudo yum install MySQL-pythoncommand to install MySQL Connector/Python.PrettyTable: Run the
pip install prettytablecommand to install PrettyTable.Java Database Connectivity (JDBC): Use mysql-connector-java-5.1.47.
TPC-H tools: Click here to download the tools. If you use OBD to perform the test, you do not need to obtain the tools.
OBClient: For more information, see OBClient Documentation.
OceanBase Database: For more information, see Get started with OceanBase Database.
IOPS: We recommend that the disk IOPS be above 10,000.
Tenant specifications:
CREATE RESOURCE UNIT tpch_unit max_cpu 26, memory_size '70g'; CREATE RESOURCE POOL tpch_pool unit = 'tpch_unit', unit_num = 1, zone_list=('zone1','zone2','zone3'); CREATE TENANT tpch_mysql resource_pool_list=('tpch_pool'), zone_list('zone1', 'zone2', 'zone3'), primary_zone=RANDOM, locality='F@zone1,F@zone2,F@zone3' set variables ob_compatibility_mode='mysql', ob_tcp_invited_nodes='%',secure_file_priv = '';Notice
- The tenant specifications mentioned above are based on the hardware configuration in the TPC-H benchmark report of OceanBase Database. You need to adjust the specifications based on the hardware configurations of your database.
- When deploying a cluster, it is recommended not to use the
obd cluster autodeploycommand. This command prioritizes stability and does not maximize resource utilization (for example, it does not use all available memory). It is advisable to optimize the configuration file separately to maximize resource utilization.
Cluster deployment
This test requires a total of four servers. Deploy TPC-H and obd on one server for stress testing, and set up the OceanBase cluster using obd on the remaining three servers in a 1:1:1 configuration.
Note
In the TPC-H test, the server for deploying TPC-H and obd requires only four CPU cores and 16 GB of memory.
After a successful deployment, create the tenants and users needed for the TPC-H test. The
systenant is a built-in system tenant for cluster management and do not use it to perform the test. Set the tenant'sprimary_zonetoRANDOM, which indicates that the leader of new table partitions is randomly assigned to one of the three servers.
One-click TPC-H test using obd
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/oceanbase/OceanBase.repo
sudo yum install obtpch
sudo ln -s /usr/tpc-h-tools/tpc-h-tools/ /usr/local/
obd test tpch obperf --tenant=tpch_mysql -s 100 --remote-tbl-dir=/tmp/tpch100
Considerations
When using obd to run the TPC-H benchmark, refer to obd test tpch for detailed parameters.
In this example, most parameters are set to their default values. In actual scenarios, you can adjust these parameters based on specific needs. For example, the cluster name used here is
obperf, and the tenant name istpch_mysql.To use obd for the test, make sure that the cluster is installed and deployed using obd. Otherwise, the cluster information cannot be obtained, and performance tuning cannot be performed based on the cluster configuration.
If the system tenant's password is changed through terminal login and is no longer the default empty value, you need to reset it to the default through the terminal first. Then, use the obd cluster edit-config command to set the password in the configuration file for the system tenant, under the
# root_password: # root user passwordsection. After that, run theobd cluster reloadcommand to apply the changes.After you run the
obd test tpchcommand, the system lists the test steps and outputs in detail. A larger data amount requires a longer test time.Ensure that the remote directory
remote-tbl-dirhas enough capacity to store TPC-H data. It is recommended to use a separate disk for storing test data.The
obd test tpchcommand automatically completes all operations, including data generation, data transmission, OceanBase parameter optimization, data loading, and testing, without requiring any additional interventions. If any errors occur during the process, refer to obd test tpch to try again. For example, you can skip data generation and transmission to proceed directly with data loading and testing.
Manual TPC-H test using the official TPC-H tool
The following instructions explain how to manually perform the TPC-H test using the official TPC-H tool. Manual testing can help you better understand OceanBase Database, especially the settings of some parameters.
Environment tuning
Perform environment tuning under the system tenant.
OceanBase Database tuning.
Run the
obclient -h$host_ip -P$host_port -uroot@sys -Acommand under the system tenant and then execute the following statements:ALTER SYSTEM flush plan cache GLOBAL; ALTER system SET enable_sql_audit=false; select sleep(5); ALTER system SET enable_perf_event=false; ALTER system SET syslog_level='PERF'; alter system set enable_record_trace_log=false;Configure the tenant.
Run the
obclient -h$host_ip -P$host_port -u$user@$tenant -p$password -Acommand in the test user and execute the following statements:SET GLOBAL ob_sql_work_area_percentage = 80; SET GLOBAL ob_query_timeout = 36000000000; SET GLOBAL ob_trx_timeout = 36000000000; SET GLOBAL max_allowed_packet = 67108864; # parallel_servers_target = max_cpu * server_num * 8 SET GLOBAL parallel_servers_target = 624;
Install the TPC-H tool
Download the TPC-H tool. For more information, visit the TPC-H Tools Download page.
Unzip the package and enter the TPC-H directory.
[wieck@localhost ~] $ unzip 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip [wieck@localhost ~] $ cd TPC-H_Tools_v3.0.0Copy the
makefile.suitefile.[wieck@localhost TPC-H_Tools_v3.0.0] $ cd dbgen/ [wieck@localhost dbgen] $ cp makefile.suite MakefileConfigure the parameters
CC,DATABASE,MACHINE, andWORKLOADin theMakefilefile.[wieck@localhost dbgen] $ vim Makefile CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCHAdd new macro definitions to the
tpcd.hfile.[wieck@localhost dbgen] $ vim tpcd.h #ifdef MYSQL #define GEN_QUERY_PLAN "" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endifCompile the file.
make
Generate data
You can generate 10 GB, 100 GB, or 1 TB of data as needed for the TPC-H test. The example below shows how to generate 100 GB of data.
./dbgen -s 100
mkdir tpch100
mv *.tbl tpch100
Generate SQL queries
Note
You can follow the steps below to generate and then customize the SQL queries, or you can directly use the ones provided on our GitHub. If you choose the latter, remember to replace cpu_num in the SQL statements with your actual concurrency level.
Copy the
qgenanddists.dssfiles into thequeriesdirectory.cp qgen queries cp dists.dss queriesIn the
queriesdirectory, create agen.shscript to generate SQL queries.[wieck@localhost queries] $ vim gen.sh #!/usr/bin/bash for i in {1..22} do ./qgen -d $i -s 100 > db"$i".sql doneExecute the
gen.shscript.chmod +x gen.sh ./gen.shModify the SQL queries.
dos2unix *For more information about the modified SQL queries, see GitHub. You need to change the value of
cpu_numin the SQL statements to the actual concurrency. To ensure the best performance, we recommend that you set the value to the total number of available CPU cores.To view the total number of available CPU cores for a tenant, run the following command under the system tenant:
SELECT sum(cpu_capacity_max) FROM __all_virtual_server;Here is a sample SQL query
q1after modification:SELECT /*+ parallel(96) */ ---Add parallel execution. l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date '1998-12-01' - interval '90' day GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
Create tables
Create a schema file named create_tpch_mysql_table_part.ddl.
DROP TABLE IF EXISTS lineitem;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS partsupp;
DROP TABLE IF EXISTS part;
DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS supplier;
DROP TABLE IF EXISTS nation;
DROP TABLE IF EXISTS region;
DROP TABLEGROUP IF EXISTS tpch_tg_lineitem_order_group;
DROP TABLEGROUP IF EXISTS tpch_tg_partsupp_part;
CREATE TABLEGROUP IF NOT EXISTS tpch_tg_lineitem_order_group binding true partition by key 1 partitions cpu_num;
CREATE TABLEGROUP IF NOT EXISTS tpch_tg_partsupp_part binding true partition by key 1 partitions cpu_num;
DROP TABLE IF EXISTS lineitem;
CREATE TABLE lineitem (
l_orderkey BIGINT NOT NULL,
l_partkey BIGINT NOT NULL,
l_suppkey INTEGER NOT NULL,
l_linenumber INTEGER NOT NULL,
l_quantity DECIMAL(15,2) NOT NULL,
l_extendedprice DECIMAL(15,2) NOT NULL,
l_discount DECIMAL(15,2) NOT NULL,
l_tax DECIMAL(15,2) NOT NULL,
l_returnflag char(1) DEFAULT NULL,
l_linestatus char(1) DEFAULT NULL,
l_shipdate date NOT NULL,
l_commitdate date DEFAULT NULL,
l_receiptdate date DEFAULT NULL,
l_shipinstruct char(25) DEFAULT NULL,
l_shipmode char(10) DEFAULT NULL,
l_comment varchar(44) DEFAULT NULL,
PRIMARY KEY(l_orderkey, l_linenumber))row_format = condensed
tablegroup = tpch_tg_lineitem_order_group
partition by key (l_orderkey) partitions cpu_num;
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
o_orderkey bigint not null,
o_custkey bigint not null,
o_orderstatus char(1) default null,
o_totalprice bigint default null,
o_orderdate date not null,
o_orderpriority char(15) default null,
o_clerk char(15) default null,
o_shippriority bigint default null,
o_comment varchar(79) default null,
PRIMARY KEY (o_orderkey))row_format = condensed
tablegroup = tpch_tg_lineitem_order_group
partition by key(o_orderkey) partitions cpu_num;
DROP TABLE IF EXISTS partsupp;
CREATE TABLE partsupp (
ps_partkey bigint not null,
ps_suppkey bigint not null,
ps_availqty bigint default null,
ps_supplycost bigint default null,
ps_comment varchar(199) default null,
PRIMARY KEY (ps_partkey, ps_suppkey))row_format = condensed
tablegroup tpch_tg_partsupp_part
partition by key(ps_partkey) partitions cpu_num;
DROP TABLE IF EXISTS part;
CREATE TABLE part (
p_partkey bigint not null,
p_name varchar(55) default null,
p_mfgr char(25) default null,
p_brand char(10) default null,
p_type varchar(25) default null,
p_size bigint default null,
p_container char(10) default null,
p_retailprice bigint default null,
p_comment varchar(23) default null,
PRIMARY KEY (p_partkey))row_format = condensed
tablegroup tpch_tg_partsupp_part
partition by key(p_partkey) partitions cpu_num;
DROP TABLE IF EXISTS customer;
CREATE TABLE customer (
c_custkey bigint not null,
c_name varchar(25) default null,
c_address varchar(40) default null,
c_nationkey bigint default null,
c_phone char(15) default null,
c_acctbal bigint default null,
c_mktsegment char(10) default null,
c_comment varchar(117) default null,
PRIMARY KEY (c_custkey))row_format = condensed
partition by key(c_custkey) partitions cpu_num;
DROP TABLE IF EXISTS supplier;
CREATE TABLE supplier (
s_suppkey bigint not null,
s_name char(25) default null,
s_address varchar(40) default null,
s_nationkey bigint default null,
s_phone char(15) default null,
s_acctbal bigint default null,
s_comment varchar(101) default null,
PRIMARY KEY (s_suppkey))row_format = condensed
partition by key(s_suppkey) partitions cpu_num;
DROP TABLE IF EXISTS nation;
CREATE TABLE nation (
n_nationkey bigint not null,
n_name char(25) default null,
n_regionkey bigint default null,
n_comment varchar(152) default null,
PRIMARY KEY (n_nationkey))row_format = condensed;
DROP TABLE IF EXISTS region;
CREATE TABLE region (
r_regionkey bigint not null,
r_name char(25) default null,
r_comment varchar(152) default null,
PRIMARY KEY (r_regionkey))row_format = condensed;
Load data
Write scripts based on the generated data and SQL queries. Here is an example of how to load the data:
Create a directory for the loading script.
[wieck@localhost dbgen] $ mkdir load [wieck@localhost dbgen] $ cd load [wieck@localhost load] $ cp xx/create_tpch_mysql_table_part.ddl ./Create the
load.pyscript.[wieck@localhost load] $ vim load.py #!/usr/bin/env python #-*- encoding:utf-8 -*- import os import sys import time import commands hostname='$host_ip' # Notice! Please fill in the IP address of a specific OBServer node, such as observer A's IP address. port='$host_port' # The port number of observer A. tenant='$tenant_name' # The tenant name. user='$user' # The username. password='$password' # The password. data_path='$path' # Notice! Please fill in the directory of tbl on observer A. db_name='$db_name' # The database name. # Create a table. cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s < create_tpch_mysql_table_part.ddl'%(hostname,port,user,tenant,password,db_name) result = commands.getstatusoutput(cmd_str) print result cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s -e "show tables;" '%(hostname,port,user,tenant,password,db_name) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/customer.tbl' into table customer fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/lineitem.tbl' into table lineitem fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/nation.tbl' into table nation fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/orders.tbl' into table orders fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -D%s -e "load data /*+ parallel(80) */ infile '%s/partsupp.tbl' into table partsupp fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/part.tbl' into table part fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/region.tbl' into table region fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/supplier.tbl' into table supplier fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print resultLoad the data.
Notice
To load data, you need to install the OBClient.
python load.py (0,'') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.\nTABLE_NAME\nT1\nLINEITEM\nORDERS\nPARTSUPP\nPART\nCUSTOMER\nSUPPLIER\nNATION\nREGION') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')Perform a major compaction.
Note
To perform a major compaction, you need to log on as the
systenant.MySQL [(none)]> use oceanbase Database changed MySQL [oceanbase]> ALTER SYSTEM major freeze tenant=<your tenant name>; Query OK, 0 rows affectedCheck whether the major compaction is complete.
MySQL [oceanbase]> SELECT FROZEN_SCN, LAST_SCN FROM oceanbase.CDB_OB_MAJOR_COMPACTION; +---------------------+---------------------+ | FROZEN_SCN | LAST_SCN | +---------------------+---------------------+ | 1667239201167716767 | 1667239201167716767 | | 1667239200111919300 | 1667239200111919300 | | 1667239201167452168 | 1667239201167452168 | | 1667239201168053124 | 1667239201168053124 | | 1667239201167520213 | 1667239201167520213 | +---------------------+---------------------+If all
FROZEN_SCNandLAST_SCNvalues are equal, the major compaction is complete.Manually collect statistics.
Run the
obclient -h$host_ip -P$host_port -u$user@$tenant -p$password -A -D$databasecommand as the test user. Then, run the following command:call dbms_stats.gather_schema_stats('$db_name',degree=>96);
Perform the test
Write scripts based on the generated data and SQL queries. Here is an example of how to perform the test:
Write the test script
tpch.shin thequeriesdirectory.[wieck@localhost queries] $ vim tpch.sh #!/bin/bash TPCH_TEST="obclient -h $host_ip -P $host_port -utpch_100g_part@tpch_mysql -D tpch_100g_part -ptest -c" # Warm-up phase for i in {1..22} do sql1="source db${i}.sql" echo $sql1| $TPCH_TEST >db${i}.log || ret=1 done # Execution phase for i in {1..22} do starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN Q${i}" sql1="source db${i}.sql" echo $sql1| $TPCH_TEST >db${i}.log || ret=1 stoptime=`date +%s%N` costtime=`echo $stoptime $starttime | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" doneExecute the test script.
sh tpch.shNote
For more information about the test results, see TPC-H benchmark report of OceanBase Database.
FAQ
Q: What do I do when data import fails? Here is the error message:
ERROR 1017 (HY000) at line 1: File not existA: The
tblfile must be placed in a directory on the server that hosts OceanBase Database, because only local data can be imported and loaded.Q: What do I do if I encounter an error while viewing the data? Here is the error message:
ERROR 4624 (HY000): No memory or reach tenant memory limitA: You are running out of memory. Allocate more memory to the tenant.
Q: What do I do if I encounter an error during data import? Here is the error message:
ERROR 1227 (42501) at line 1: Access deniedA: You need to grant the access privilege to the logon user. The command is as follows:
grant file on *.* to tpch_100g_part;Q: What do I do if I encounter an error while adjusting
dos2unix *for the SQL query? Here is the error message:-bash: dos2unix: command not foundA: You need to install dos2unix. The command is as follows:
yum install -y dos2unix