OceanBase Database is suitable for hybrid transaction/analytical processing (HTAP) scenarios. OceanBase Database adopts a distributed architecture based on peer nodes. This architecture allows OceanBase Database to handle high-concurrency and scalable online transaction processing (OLTP) tasks and perform parallel computing for online analytical processing (OLAP) tasks based on the massively parallel processing (MPP) architecture in the same data engine, without maintaining two sets of data.
OceanBase Database not only allows you to analyze a large amount of online business data in parallel but also allows you to perform parallel DML (PDML) operations to quickly and securely execute large transactions that concurrently write data in batches. All these are achieved without compromising transaction consistency.
The following describes how to manually run the Transaction Processing Performance Council Benchmark H (TPC-H) benchmark test to show the characteristics of OceanBase Database in operational OLAP scenarios. TPC-H is a commonly used benchmark that measures the analysis and decision support capabilities of database systems by using a series of complex queries on massive amounts of data. For more information, visit the official website of TPC. On May 20, 2021, OceanBase Database set a new world record in running the TPC-H benchmark with a result of 15.26 million QphH. It is by far the only database that achieved top results in running both the TPC-C and TPC-H benchmarks, which testifies its HTAP capabilities in both online transactions and real-time analysis. For more information, see TPC-H Results.
Manually run the TPC-H benchmark test
Use the official TPC-H tools from TPC to manually run the TPC-H benchmark test step by step. Manual testing can help you better understand OceanBase Database, especially the settings of some parameters.
Optimize the test environment
Optimize OceanBase Database parameters.
Execute the following statements in the system tenant (
systenant) to configure the relevant parameters:# Adjust the memory usage of the sys tenant to provide more resources for the test tenant. You need to adjust the memory usage based on the actual environment. ALTER SYSTEM SET system_memory='15g'; ALTER RESOURCE UNIT sys_unit_config memory_size ='15G'; # Optimize the parameters. ALTER SYSTEM SET trace_log_slow_query_watermark='100s'; ALTER SYSTEM SET enable_sql_operator_dump=True; ALTER SYSTEM SET _hash_area_size='3g'; ALTER SYSTEM SET memstore_limit_percentage=50; ALTER SYSTEM SET enable_rebalance=False; ALTER SYSTEM SET memory_chunk_cache_size='0'; ALTER SYSTEM SET major_compact_trigger=5; ALTER SYSTEM SET cache_wash_threshold='30g'; # Adjust the log level and the maximum number of log files that can be saved. ALTER SYSTEM SET syslog_level='ERROR'; ALTER SYSTEM SET max_syslog_file_count=100; ALTER SYSTEM SET enable_syslog_recycle='True';Set tenant parameters.
Execute the following statements in the test tenant (user tenant) to configure the relevant parameters:
# Set global parameters. SET GLOBAL ob_sql_work_area_percentage=80; SET GLOBAL optimizer_use_sql_plan_baselines = true; SET GLOBAL optimizer_capture_sql_plan_baselines = true; ALTER SYSTEM SET ob_enable_batched_multi_statement='true'; # Set relevant tenant parameters to prevent transaction timeout. SET GLOBAL ob_query_timeout=36000000000; SET GLOBAL ob_trx_timeout=36000000000; SET GLOBAL max_allowed_packet=67108864; SET GLOBAL secure_file_priv=""; /* We recommend that you set 'parallel_server_target' to the result of the following formula: Number of CPU cores of the resource units allocated to the test tenant × 10 × Number of OBServer nodes × 0.8. For example, if the test tenant's unit configuration is `CREATE RESOURCE UNIT $unit_name max_cpu 26;`, then the value would be 26*10*3*0.8=624. */ SET GLOBAL parallel_servers_target=624;Restart the cluster.
Install the TPC-H tools
Download the TPC-H tools from the TPC-H Tools Download page.
Unzip the package and go to the TPC-H directory.
[wieck@localhost ~] $ unzip 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip [wieck@localhost ~] $ cd TPC-H_Tools_v3.0.0Copy the
Makefile.suitefile.[wieck@localhost TPC-H_Tools_v3.0.0] $ cd dbgen/ [wieck@localhost dbgen] $ cp Makefile.suite MakefileRedefine the
CC,DATABASE,MACHINE, andWORKLOADparameters in theMakefile.suitefile.CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCHModify the
tpcd.hfile and add new macro definitions.#ifdef MYSQL #define GEN_QUERY_PLAN "" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endifCompile the files.
make
Generate data
Generate 10 GB, 100 GB, or 1 TB of data as needed. The following example shows the command to generate 100 GB of data.
./dbgen -s 100
mkdir tpch100
mv *.tbl tpch100
Generate SQL query statements
Note
You can perform the following steps to generate an SQL query statement and then adjust it, or you can use the SQL query statement provided in GitHub.
However, before you use the SQL query statement provided in GitHub, change the value of the cpu_num parameter in the statement to the actual number of concurrent threads.
Copy the
qgenanddists.dssfiles to thequeriesdirectory.cp qgen queries cp dists.dss queriesIn the
queriesdirectory, create agen.shscript to generate SQL query statements.#!/usr/bin/bash for i in {1..22} do ./qgen -d $i -s 100 > db"$i".sql doneRun the
gen.shscript.chmod +x gen.sh ./gen.shModify the SQL query statements.
dos2unix *
For the modified SQL query statements, see GitHub. You need to change the value of the cpu_num parameter in the SQL statements to the actual number of concurrent threads. To ensure the best performance, we recommend that you set the value to the total number of available CPU cores.
Run the following command under the sys tenant to view the total number of available CPU cores for a tenant:
select sum(max_cpu) from DBA_OB_UNITS ;
Take q1 as an example. The modified SQL statement is as follows:
select /*+ parallel(96) */ ---Add parallel execution
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '90' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Create a table
Create a schema file named create_tpch_mysql_table_part.ddl.
create tablegroup if not exists tpch_tg_100g_lineitem_order_group binding true partition by key 1 partitions 192;
create tablegroup if not exists tpch_tg_100g_partsupp_part binding true partition by key 1 partitions 192;
drop database if exists $db_name;
create database $db_name;
use $db_name;
create table lineitem (
l_orderkey bigint not null,
l_partkey bigint not null,
l_suppkey bigint not null,
l_linenumber bigint not null,
l_quantity bigint not null,
l_extendedprice bigint not null,
l_discount bigint not null,
l_tax bigint not null,
l_returnflag char(1) default null,
l_linestatus char(1) default null,
l_shipdate date not null,
l_commitdate date default null,
l_receiptdate date default null,
l_shipinstruct char(25) default null,
l_shipmode char(10) default null,
l_comment varchar(44) default null,
primary key(l_orderkey, l_linenumber))
tablegroup = tpch_tg_100g_lineitem_order_group
partition by key (l_orderkey) partitions 192;
create index I_L_ORDERKEY on lineitem(l_orderkey) local;
create index I_L_SHIPDATE on lineitem(l_shipdate) local;
create table orders (
o_orderkey bigint not null,
o_custkey bigint not null,
o_orderstatus char(1) default null,
o_totalprice bigint default null,
o_orderdate date not null,
o_orderpriority char(15) default null,
o_clerk char(15) default null,
o_shippriority bigint default null,
o_comment varchar(79) default null,
primary key (o_orderkey))
tablegroup = tpch_tg_100g_lineitem_order_group
partition by key(o_orderkey) partitions 192;
create index I_O_ORDERDATE on orders(o_orderdate) local;
create table partsupp (
ps_partkey bigint not null,
ps_suppkey bigint not null,
ps_availqty bigint default null,
ps_supplycost bigint default null,
ps_comment varchar(199) default null,
primary key (ps_partkey, ps_suppkey))
tablegroup tpch_tg_100g_partsupp_part
partition by key(ps_partkey) partitions 192;
create table part (
p_partkey bigint not null,
p_name varchar(55) default null,
p_mfgr char(25) default null,
p_brand char(10) default null,
p_type varchar(25) default null,
p_size bigint default null,
p_container char(10) default null,
p_retailprice bigint default null,
p_comment varchar(23) default null,
primary key (p_partkey))
tablegroup tpch_tg_100g_partsupp_part
partition by key(p_partkey) partitions 192;
create table customer (
c_custkey bigint not null,
c_name varchar(25) default null,
c_address varchar(40) default null,
c_nationkey bigint default null,
c_phone char(15) default null,
c_acctbal bigint default null,
c_mktsegment char(10) default null,
c_comment varchar(117) default null,
primary key (c_custkey))
partition by key(c_custkey) partitions 192;
create table supplier (
s_suppkey bigint not null,
s_name char(25) default null,
s_address varchar(40) default null,
s_nationkey bigint default null,
s_phone char(15) default null,
s_acctbal bigint default null,
s_comment varchar(101) default null,
primary key (s_suppkey)
) partition by key(s_suppkey) partitions 192;
create table nation (
n_nationkey bigint not null,
n_name char(25) default null,
n_regionkey bigint default null,
n_comment varchar(152) default null,
primary key (n_nationkey));
create table region (
r_regionkey bigint not null,
r_name char(25) default null,
r_comment varchar(152) default null,
primary key (r_regionkey));
Load data
Write your own script based on the data and SQL query statements generated from the preceding steps. The following is an example of how to load data.
Create a directory for loading scripts.
mkdir load cd load cp ../dss.ri ../dss.ddl ./Create the
load.pyscript.$cat load.py #/usr/bin/evn python #-*- encoding:utf-8 -*- import os import sys import time import commands hostname='$host_ip' # Attention! Please fill in the IP address of an OBServer node, such as OBServer A. port='$host_port' # Port number of OBServer A tenant='$tenant_name' # Tenant name user='$user' # Username password='$password' # Password data_path='$path' # Attention! Please fill in the directory where tbl is located on OBServer A. db_name='$db_name' # Database name # Create a table cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s < create_tpch_mysql_table_part.ddl'%(hostname,port,user,tenant,password,db_name) result = commands.getstatusoutput(cmd_str) print result cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s -e "show tables;" '%(hostname,port,user,tenant,password,db_name) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/customer.tbl' into table customer fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/lineitem.tbl' into table lineitem fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/nation.tbl' into table nation fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/orders.tbl' into table orders fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -D%s -e "load data /*+ parallel(80) */ infile '%s/partsupp.tbl' into table partsupp fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/part.tbl' into table part fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/region.tbl' into table region fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/supplier.tbl' into table supplier fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path) result = commands.getstatusoutput(cmd_str) print resultLoad the data.
Notice
To load data, you need to install the OBClient.
$ python load.py (0,'') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.\nTABLE_NAME\nT1\nLINEITEM\nORDERS\nPARTSUPP\nPART\nCUSTOMER\nSUPPLIER\nNATION\nREGION') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.') (0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')Perform a major compaction.
A major compaction combines the current major version's SSTables and MemTables with the full static data of the previous major version and generates a new set of full data. This makes the statistics of the storage layer more accurate and improves the stability of execution plans.
Note
To perform a major compaction, you must log in to the
systenant of the OceanBase cluster as therootuser.MySQL [(none)]> USE oceanbase Database changed MySQL [oceanbase]> ALTER SYSTEM MAJOR FREEZE; Query OK, 0 rows affectedCheck whether the major compaction is complete.
MySQL [oceanbase]> SELECT FROZEN_SCN, LAST_SCN FROM oceanbase.CDB_OB_MAJOR_COMPACTION; +---------------------+---------------------+ | FROZEN_SCN | LAST_SCN | +---------------------+---------------------+ | 1667239201167716767 | 1667239201167716767 | | 1667239200111919300 | 1667239200111919300 | | 1667239201167452168 | 1667239201167452168 | | 1667239201168053124 | 1667239201168053124 | | 1667239201167520213 | 1667239201167520213 | +---------------------+---------------------+Note
The major compaction is considered complete when the value of
FROZEN_SCNis equal to that ofLAST_SCNfor all zones.
Run the test
Write your own script based on the data and SQL query statements generated from the preceding steps. The following is an example of how to run the test.
Write the
