This topic describes the software requirements, tenant specifications, and detailed test methods for running TPC-H tests on OceanBase Database.
What is TPC-H
TPC-H (Transaction Processing Performance Council's Decision Support Benchmark) is a benchmark from the Transaction Processing Performance Council (TPC) that simulates decision-support workloads. Academia and industry widely use it to evaluate decision-support performance. It measures a system's overall analytic computing capability, places high demands on vendors, and has significant commercial value. Common use cases include credit and card analysis in banking, telecom operations analysis, tax analysis, and decision support in the tobacco industry.
TPC-H evolved from TPC-D, a decision-support benchmark TPC introduced in 1994. TPC-H models a data warehouse in third normal form (3NF) with eight base tables. The primary metric is query response time—the interval from query submission to result return. Results are reported as Queries per Hour at a given scale (QphH@size), where H is the average number of complex queries executed per hour and size is the database scale. Because TPC-H reflects real production environments, it can evaluate performance characteristics that other benchmarks miss. In short, TPC-H meets data-warehouse benchmarking needs and drives vendors and researchers to push the technology forward.
Note
Starting with OceanBase Database V4.0.0, extensive optimizations improve usability and help developers get better out-of-the-box performance. This testing guide covers basic parameter tuning to help you achieve strong TPC-H results.
Prepare the environment
Before you start the test, prepare the test environment as required:
Note
This example is based on a MySQL tenant.
Software requirements
JDK: We recommend that you use JDK 1.8u131 or later.
make: Run the
yum install makecommand to install make.GCC: Run the
yum install gcccommand to install GCC.mysql-devel: Run the
yum install mysql-develcommand to install mysql-devel.Python database driver: Run the
sudo yum install MySQL-pythoncommand to install the Python database driver.prettytable: Run the
pip install prettytablecommand to install prettytable.JDBC: We recommend that you use the
mysql-connector-java-5.1.47version.TPC-H Tool: Download it from the TPC-H Tool download page. You can skip this tool if you use the OBD one-click test.
OBClient: See OBClient documentation.
OceanBase Database: See Quickly experience OceanBase Database.
IOPS: Use disks with IOPS above 10,000.
Tenant specification configuration
The tenant specification follows the hardware configuration in the OceanBase Database TPC-H Benchmark Report. Adjust it to match your hardware.
Cluster deployment
This test requires four machines. Deploy TPC-H and OBD on one machine as the client load generator. Use OBD to deploy a three-node OceanBase cluster (1:1:1 scale).
Note
- For the TPC-H test, the machine that hosts TPC-H and OBD requires only four CPU cores and 16 GB of memory.
- When deploying a cluster, avoid the
obd cluster autodeploycommand. It limits resource usage (for example, memory) for stability. Instead, tune the configuration file to make full use of available resources.
After deployment, create a tenant and user for the TPC-H test. Do not use the
systenant—it is the internal management tenant. Set the tenant'sprimary_zonetoRANDOMso that leaders of newly created table partitions are distributed randomly across the three nodes.
Tenant creation
You can run the
OBD CLUSTER TENANT CREATEcommand to create a test tenant. The command syntax is as follows:obd cluster tenant create <DEPLOY_NAME> -n <TENANT_NAME> --max-cpu=28 --memory-size=180G -–zone-list=zone1,zone2,zone3 -–primary-zone=RANDOM --locality=F@zone1,F@zone2,F@zone3 --charset=utf8 -s 'ob_tcp_invited_nodes="%"' --optimize=<optimize>The parameters are described as follows:
DEPLOY_NAME: the name of the cluster.TENANT_NAME: the name of the tenant.--zone-list: the list of zones for the tenant.--primary-zone: the primary zone for the tenant.--locality: the distribution of replicas across zones.--charset: the character set of the tenant.-s: the system variable value of the tenant.OPTIMIZE: the load type of the tenant, which can beexpress_oltp,complex_oltp,olap,htap, orkv. The default load type ishtap, which is suitable for hybrid OLAP and OLTP workloads. For more information about OBD deployment, see obd cluster tenant create.Notice
For V4.3.x and later versions, when you use OBD for deployment, you can set the
scenarioparameter to specify the cluster load type. If you do not set thescenarioparameter, the default value ishtap. For more information, see Deploy OceanBase Database with OBD.
For example, create a tenant named
tpch_tenantthat uses theobperfcluster with 28 CPU cores and 180 GB of memory, and set the default tenant load type to match the cluster scenario.obd cluster tenant create obperf -n tpch_tenant --max-cpu=28 --memory-size=180G -–zone-list=zone1,zone2,zone3 -–primary-zone=RANDOM --locality=F@zone1,F@zone2,F@zone3 --charset=utf8 -s 'ob_tcp_invited_nodes="%"' --optimize=htapNote
In this example,
--optimize=htapis the default load type. In a production environment, select the appropriate load type based on the cluster type.
Test methods
After the test environment is ready, you can perform the TPC-H performance test by using either of the following two methods:
Run the TPC-H test with OBD (one-click)
Run the TPC-H test manually with the TPC-H tool
Run a one-click TPC-H test with OBD
Run a TPC-H test with one command using OBD:
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/oceanbase/OceanBase.repo
sudo yum install obtpch
sudo ln -s /usr/tpc-h-tools/tpc-h-tools/ /usr/local/
obd test tpch obperf --tenant=tpch_mysql -s 100 --remote-tbl-dir=/tmp/tpch100
Before running the script, note the following:
For TPC-H test parameters, see obd test tpch.
This example uses mostly default values. Adjust parameters for your environment as needed. Here, the cluster name is
obperfand the tenant name istpch_mysql.For a one-click OBD test, the cluster must have been installed and deployed by OBD. Otherwise, OBD cannot read cluster information or tune performance based on the cluster configuration.
If you changed the
systenant password from the default empty value in the client, reset it to empty in the client first. Then use obd cluster edit-config to set the password in the configuration file (# root_password: # root user password). After editing the config, runobd cluster reloadto apply the change.After you run
obd test tpch, the system prints each step and its output in detail. Larger data volumes take longer.The
remote-tbl-dirdirectory must have enough space for TPC-H data. We recommend using a separate disk for load test data.The
obd test tpchcommand runs the full workflow without further manual steps: data generation and transfer, OceanBase parameter tuning, data loading, and testing. If a step fails, retry with obd test tpch—for example, skip data generation and transfer and run loading and testing only.
Run the TPC-H test manually with the TPC-H tool
After you choose a cluster load type and tenant tuning scenario, you can run TPC-H manually to better understand OceanBase Database, especially parameter tuning.
Step 1: Create a test tenant
Note
If the test tenant has been created during the environment preparation, you can skip this step.
Run the following commands in the system tenant (sys tenant) to create a test tenant:
Note
The OceanBase cluster environment for this test is deployed in the 1:1:1 mode.
Create a resource unit named
mysql_box.CREATE RESOURCE UNIT mysql_box MAX_CPU 28, MEMORY_SIZE '200G', MIN_IOPS 200000, MAX_IOPS 12800000, LOG_DISK_SIZE '300G';Create a resource pool named
mysql_pool.CREATE RESOURCE POOL mysql_pool UNIT = 'mysql_box', UNIT_NUM = 1, ZONE_LIST = ('z1','z2','z3');Create a MySQL tenant named
mysql_tenant.CREATE TENANT mysql_tenant RESOURCE_POOL_LIST = ('mysql_pool'), PRIMARY_ZONE = RANDOM, LOCALITY = 'F@z1,F@z2,F@z3' SET VARIABLES ob_compatibility_mode='mysql', ob_tcp_invited_nodes='%', secure_file_priv = "/";
Step 2: Optimize the environment
Optimize OceanBase Database.
Run the following commands in the system tenant (
systenant) to configure the related parameters.ALTER SYSTEM flush plan cache GLOBAL; ALTER system SET enable_sql_audit=false; select sleep(5); ALTER system SET enable_perf_event=false; ALTER system SET syslog_level='PERF'; alter system set enable_record_trace_log=false; alter system set data_storage_warning_tolerance_time = '300s'; alter system set _data_storage_io_timeout = '600s'; alter system set trace_log_slow_query_watermark = '7d'; alter system set large_query_threshold='0ms'; alter system set enable_syslog_recycle= 1; alter system set _max_px_workers_per_cpu = 4 tenant= 'mysql_tenant'; alter system set default_table_organization = 'HEAP' tenant = 'mysql_tenant';Optimize the tenant.
Run the following commands in the test tenant (user tenant) to configure the related parameters.
SET global NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'; SET global NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF'; SET global NLS_TIMESTAMP_TZ_FORMAT='YYYY-MM-DD HH24:MI:SS.FF TZR TZD'; set global ob_query_timeout=10800000000; set global ob_trx_timeout=10000000000; set global ob_sql_work_area_percentage=80; alter system set ob_enable_batched_multi_statement='true'; alter system set default_table_store_format = 'column' ; alter system set _io_read_batch_size = '2M'; alter system set _io_read_redundant_limit_percentage = 50; set global parallel_servers_target=10000; set global collation_connection = utf8mb4_bin; set global collation_database = utf8mb4_bin; set global collation_server = utf8mb4_bin; set global autocommit=1; alter system set _nested_loop_join_enabled = false; alter system set ob_enable_batched_multi_statement='true';
Step 3: Install TPC-H Tool
Download TPC-H Tool. For more information, see TPC-H Tool Download Page.
After the file is downloaded, decompress it and go to the TPC-H directory.
[wieck@localhost ~] $ unzip 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip [wieck@localhost ~] $ cd TPC-H_Tools_v3.0.0Copy the
Makefile.suitefile.[wieck@localhost TPC-H_Tools_v3.0.0] $ cd dbgen/ [wieck@localhost dbgen] $ cp Makefile.suite MakefileModify the
CC,DATABASE,MACHINE, andWORKLOADparameters in theMakefilefile.[wieck@localhost dbgen] $ vim MakefileCC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCHModify the
tpcd.hfile and add the new macro definitions.[wieck@localhost dbgen] $ vim tpcd.h#ifdef MYSQL #define GEN_QUERY_PLAN "" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endifCompile the file.
makeThe returned result is as follows:
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lmThis step produces
dbgenfor data generation andqgenanddists.dssfor SQL generation.
Step 4: Generate data
Generate 10 GB, 100 GB, or 1 TB of TPC-H data depending on your environment. This example uses 100 GB.
./dbgen -s 100
mkdir tpch100
mv *.tbl tpch100
Generate 1 TB of data using multiple threads. OceanBase Database supports direct load, which allows you to import data from multiple files into a table at the same time:
#!/bin/bash
SCALE_FACTOR=1000
CHUNK_COUNT=20
for ((i=1; i<=CHUNK_COUNT; i++))
do
CMD="./dbgen -s ${SCALE_FACTOR} -C ${CHUNK_COUNT} -S ${i} -vf"
$CMD &
done
wait
echo "All data generation tasks completed."
Step 5: Generate query SQL statements
Note
Follow the steps below to generate query SQL, then adjust as needed. Alternatively, use the query SQL on GitHub. If you use the GitHub queries, replace cpu_num with your actual concurrency level.
Use the TPC-H tool to generate query SQL as follows:
Copy the
dbgen/qgenanddbgen/dists.dssfiles to themysql_sqldirectory.Create a
gen.shscript in themysql_sqldirectory to generate query SQL statements.vim gen.sh#!/usr/bin/bash for i in {1..22} do ./qgen -d $i -s 100 > db"$i".sql doneModify the query SQL statements based on the actual concurrency number.
You can use the following command in the
systenant to view the total number of available CPU cores for the tenant.select sum(max_cpu) from DBA_OB_UNITS;Here is an example of the modified SQL statement for
Q1:SELECT /*+ parallel(96) */ ---Add parallel execution l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date '1998-12-01' - interval '90' day GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
Step 6: Create tables
For 100 GB of data, create a table structure file named
create_tpch_mysql_table_part.ddl.drop tablegroup IF EXISTS tpch_tg_SF_TPC_USER_lineitem_order_group; drop tablegroup IF EXISTS tpch_tg_SF_TPC_USER_partsupp_part; create tablegroup tpch_tg_SF_TPC_USER_lineitem_order_group binding true partition by key 1 partitions 256; create tablegroup tpch_tg_SF_TPC_USER_partsupp_part binding true partition by key 1 partitions 256; DROP TABLE IF EXISTS LINEITEM; CREATE TABLE lineitem ( l_orderkey bigint NOT NULL, l_partkey int(32) NOT NULL, l_suppkey int(32) NOT NULL, l_linenumber int(32) NOT NULL, l_quantity decimal(32,2) NOT NULL, l_extendedprice decimal(32,2) NOT NULL, l_discount decimal(15,2) NOT NULL, l_tax decimal(15,2) NOT NULL, l_returnflag char(1) DEFAULT NULL, l_linestatus char(1) DEFAULT NULL, l_shipdate date NOT NULL, l_commitdate date DEFAULT NULL, l_receiptdate date DEFAULT NULL, l_shipinstruct varchar(64) DEFAULT NULL, l_shipmode varchar(64) DEFAULT NULL, l_comment varchar(64) DEFAULT NULL, primary key(l_shipdate, l_orderkey, l_linenumber) )row_format = condensed partition by key (l_orderkey) partitions 96 with column group(each column); alter table lineitem CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS ORDERS; CREATE TABLE orders ( o_orderkey bigint NOT NULL, o_custkey int(32) NOT NULL, o_orderstatus varchar(64) DEFAULT NULL, o_totalprice decimal(15,2) DEFAULT NULL, o_orderdate date NOT NULL, o_orderpriority varchar(15) DEFAULT NULL, o_clerk varchar(15) DEFAULT NULL, o_shippriority int(32) DEFAULT NULL, o_comment varchar(128) DEFAULT NULL, PRIMARY KEY (o_orderkey, o_orderdate) ) row_format = condensed partition by key(o_orderkey) partitions 96 with column group(each column); alter table orders CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS PARTSUPP; CREATE TABLE partsupp ( ps_partkey int(11) NOT NULL, ps_suppkey int(11) NOT NULL, ps_availqty int(11) DEFAULT NULL, ps_supplycost decimal(15,2) DEFAULT NULL, ps_comment varchar(199) DEFAULT NULL, PRIMARY KEY (ps_partkey, ps_suppkey)) row_format = condensed partition by key(ps_partkey) partitions 96 with column group(each column); alter table partsupp CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS PART; CREATE TABLE part ( p_partkey int(11) NOT NULL, p_name varchar(55) DEFAULT NULL, p_mfgr varchar(25) DEFAULT NULL, p_brand varchar(10) DEFAULT NULL, p_type varchar(25) DEFAULT NULL, p_size int(11) DEFAULT NULL, p_container varchar(10) DEFAULT NULL, p_retailprice decimal(12,2) DEFAULT NULL, p_comment varchar(23) DEFAULT NULL, PRIMARY KEY (p_partkey)) row_format = condensed partition by key(p_partkey) partitions 96 with column group(each column); alter table part CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS CUSTOMER; CREATE TABLE customer ( c_custkey int(11) NOT NULL, c_name varchar(25) DEFAULT NULL, c_address varchar(40) DEFAULT NULL, c_nationkey int(11) DEFAULT NULL, c_phone varchar(15) DEFAULT NULL, c_acctbal decimal(15,2) DEFAULT NULL, c_mktsegment char(10) DEFAULT NULL, c_comment varchar(117) DEFAULT NULL, PRIMARY KEY (c_custkey)) row_format = condensed partition by key(c_custkey) partitions 96 with column group(each column); alter table customer CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS SUPPLIER; CREATE TABLE supplier ( s_suppkey int(11) NOT NULL, s_name varchar(25) DEFAULT NULL, s_address varchar(40) DEFAULT NULL, s_nationkey int(11) DEFAULT NULL, s_phone varchar(15) DEFAULT NULL, s_acctbal decimal(15,2) DEFAULT NULL, s_comment varchar(101) DEFAULT NULL, PRIMARY KEY (s_suppkey) ) row_format = condensed partition by key(s_suppkey) with column group(each column); alter table supplier CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS NATION; CREATE TABLE nation ( n_nationkey int(11) NOT NULL, n_name varchar(25) DEFAULT NULL, n_regionkey int(11) DEFAULT NULL, n_comment varchar(152) DEFAULT NULL, PRIMARY KEY (n_nationkey) ) row_format = condensed with column group(each column); alter table nation CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS REGION; CREATE TABLE region ( r_regionkey int(11) NOT NULL, r_name varchar(25) DEFAULT NULL, r_comment varchar(152) DEFAULT NULL, PRIMARY KEY (r_regionkey) ) row_format = condensed with column group(each column); alter table region CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; CREATE VIEW revenue0 AS SELECT l_suppkey as supplier_no, SUM(l_extendedprice * ( 1 - l_discount )) as total_revenue FROM lineitem WHERE l_shipdate >= DATE '1996-01-01' AND l_shipdate < DATE '1996-04-01' GROUP BY l_suppkey;For 1 TB of data, create a table structure file named
create_tpch_mysql_table_part_1000G.ddl.drop tablegroup IF EXISTS tpch_tg_SF_TPC_USER_lineitem_order_group_1000; drop tablegroup IF EXISTS tpch_tg_SF_TPC_USER_partsupp_part_1000; create tablegroup tpch_tg_SF_TPC_USER_lineitem_order_group_1000 binding true partition by key 1 partitions 256; create tablegroup tpch_tg_SF_TPC_USER_partsupp_part_1000 binding true partition by key 1 partitions 256; DROP TABLE IF EXISTS LINEITEM; CREATE TABLE lineitem ( l_orderkey bigint NOT NULL, l_partkey int(32) NOT NULL, l_suppkey int(32) NOT NULL, l_linenumber int(32) NOT NULL, l_quantity decimal(32,2) NOT NULL, l_extendedprice decimal(32,2) NOT NULL, l_discount decimal(15,2) NOT NULL, l_tax decimal(15,2) NOT NULL, l_returnflag char(1) DEFAULT NULL, l_linestatus char(1) DEFAULT NULL, l_shipdate date NOT NULL, l_commitdate date DEFAULT NULL, l_receiptdate date DEFAULT NULL, l_shipinstruct varchar(64) DEFAULT NULL, l_shipmode varchar(64) DEFAULT NULL, l_comment varchar(64) DEFAULT NULL, primary key(l_shipdate, l_orderkey, l_linenumber) )row_format = condensed partition by key (l_orderkey) partitions 96 with column group(each column); alter table lineitem CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS ORDERS; CREATE TABLE orders ( o_orderkey bigint NOT NULL, o_custkey int(32) NOT NULL, o_orderstatus varchar(64) DEFAULT NULL, o_totalprice decimal(15,2) DEFAULT NULL, o_orderdate date NOT NULL, o_orderpriority varchar(15) DEFAULT NULL, o_clerk varchar(15) DEFAULT NULL, o_shippriority int(32) DEFAULT NULL, o_comment varchar(128) DEFAULT NULL, PRIMARY KEY (o_orderkey, o_orderdate) ) row_format = condensed partition by key(o_orderkey) partitions 96 with column group(each column); alter table orders CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS PARTSUPP; CREATE TABLE partsupp ( ps_partkey int(11) NOT NULL, ps_suppkey int(11) NOT NULL, ps_availqty int(11) DEFAULT NULL, ps_supplycost decimal(15,2) DEFAULT NULL, ps_comment varchar(199) DEFAULT NULL, PRIMARY KEY (ps_partkey, ps_suppkey)) row_format = condensed partition by key(ps_partkey) partitions 96 with column group(each column); alter table partsupp CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS PART; CREATE TABLE part ( p_partkey int(11) NOT NULL, p_name varchar(55) DEFAULT NULL, p_mfgr varchar(25) DEFAULT NULL, p_brand varchar(10) DEFAULT NULL, p_type varchar(25) DEFAULT NULL, p_size int(11) DEFAULT NULL, p_container varchar(10) DEFAULT NULL, p_retailprice decimal(12,2) DEFAULT NULL, p_comment varchar(23) DEFAULT NULL, PRIMARY KEY (p_partkey)) row_format = condensed partition by key(p_partkey) partitions 96 with column group(each column); alter table part CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS CUSTOMER; CREATE TABLE customer ( c_custkey int(11) NOT NULL, c_name varchar(25) DEFAULT NULL, c_address varchar(40) DEFAULT NULL, c_nationkey int(11) DEFAULT NULL, c_phone varchar(15) DEFAULT NULL, c_acctbal decimal(15,2) DEFAULT NULL, c_mktsegment char(10) DEFAULT NULL, c_comment varchar(117) DEFAULT NULL, PRIMARY KEY (c_custkey)) row_format = condensed partition by key(c_custkey) partitions 96 with column group(each column); alter table customer CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS SUPPLIER; CREATE TABLE supplier ( s_suppkey int(11) NOT NULL, s_name varchar(25) DEFAULT NULL, s_address varchar(40) DEFAULT NULL, s_nationkey int(11) DEFAULT NULL, s_phone varchar(15) DEFAULT NULL, s_acctbal decimal(15,2) DEFAULT NULL, s_comment varchar(101) DEFAULT NULL, PRIMARY KEY (s_suppkey) ) row_format = condensed partition by key(s_suppkey) with column group(each column); alter table supplier CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS NATION; CREATE TABLE nation ( n_nationkey int(11) NOT NULL, n_name varchar(25) DEFAULT NULL, n_regionkey int(11) DEFAULT NULL, n_comment varchar(152) DEFAULT NULL, PRIMARY KEY (n_nationkey) ) row_format = condensed with column group(each column); alter table nation CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; DROP TABLE IF EXISTS REGION; CREATE TABLE region ( r_regionkey int(11) NOT NULL, r_name varchar(25) DEFAULT NULL, r_comment varchar(152) DEFAULT NULL, PRIMARY KEY (r_regionkey) ) row_format = condensed with column group(each column); alter table region CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin; CREATE VIEW revenue0 AS SELECT l_suppkey as supplier_no, SUM(l_extendedprice * ( 1 - l_discount )) as total_revenue FROM lineitem WHERE l_shipdate >= DATE '1996-01-01' AND l_shipdate < DATE '1996-04-01' GROUP BY l_suppkey;
Step 7: Load data
Use the data and SQL from the previous steps to build a load script. The example below shows one approach:
Create a load script named
load_data.sh.#!/bin/bash host='$host_ip' # Note: Please fill in the IP address of an observer, such as the server where observer A is located. It is recommended to store the data files on the same server. port='$host_port' # The port number of observer A. user='$user' # The username. tenant='$tenant_name' # The tenant name. password='$password' # The password. database='$db_name' # The database name. data_path='$data_file' # Note: Please fill in the path of a data file generated in the data generation step, such as the .tbl file generated in observer A. function load_data { remote_user="$user" # The username of the observer node where the data is stored. table_name=${1} if [[ ${password} == "" ]];then obclient_conn="obclient -h${host} -P${port} -u${user} -D${database} -A -c" else obclient_conn="obclient -h${host} -P${port} -u${user} -D${database} -p${password} -A -c" fi table_list=$(ssh "${remote_user}@${host}" "ls ${data_path}/${table_name}.tbl* 2>/dev/null") echo "$table_list" IFS=$'\n' read -d '' -r -a table_files <<< "$table_list" table_files_comma_separated=$(IFS=,; echo "${table_files[*]}") echo "${table_files_comma_separated}" echo `date "+[%Y-%m-%d %H:%M:%S]"` "----------------------Importing data files of the ${table_name} table----------------------" # Import data by using direct load. You can modify the following statement to use other data import methods. # Note: The data files must be stored on the test server of the OBServer. echo "load data /*+ parallel(80) direct(true,0) */ infile '${table_files_comma_separated}' into table ${table_name} fields terminated by '|';" | ${obclient_conn} } starttime=`date +%s%N` for table in "nation" "region" "customer" "lineitem" "orders" "partsupp" "part" "supplier" do load_data "${table}" done end_time=`date +%s%N` totaltime=`echo ${end_time} ${starttime} | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` echo `date "+[%Y-%m-%d %H:%M:%S]"` "load data cost ${totaltime}s"After the data is loaded, you must perform a major compaction and collect statistics.
Perform a major compaction.
Execute the following statement in the test tenant to trigger a major compaction:
ALTER SYSTEM MAJOR FREEZE;Check whether the major compaction is completed.
You can check whether the major compaction is completed in the
systenant.SELECT dt.TENANT_NAME, cc.FROZEN_SCN, cc.LAST_SCN FROM oceanbase.DBA_OB_TENANTS dt, oceanbase.CDB_OB_MAJOR_COMPACTION cc WHERE dt.TENANT_ID = cc.TENANT_ID AND dt.TENANT_NAME = 'mysql_tenant';Note
If all
FROZEN_SCNandLAST_SCNvalues are equal, the major compaction is completed.Collect statistics.
Create a file named
analyze_table.sql:call dbms_stats.gather_table_stats(NULL, 'part', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'lineitem', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'customer', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'orders', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'partsupp', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128'); call dbms_stats.gather_table_stats(NULL, 'supplier', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128');Log in to the test tenant and run:
source analyze_table.sql
Step 8: Run the test
Build a test script from the data and SQL created in the previous steps. The example below shows one approach:
Create a test script named
tpch.sh.#!/bin/bash host='$host_ip' # Note: Please fill in the IP address of an observer, such as observer A. port='$host_port' # The port number of observer A. user='$user' # The username. tenant='$tenant_name' # The tenant name. password='$password' # The password. database='$db_name' # The database name. if [[ ${password} == "" ]];then TPCH_TEST="obclient -h${host} -P${port} -u${user}@{$tenant} -D${database} -A -c" else TPCH_TEST="obclient -h${host} -P${port} -p${password} -u${user}@{$tenant} -D${database} -A -c" fi function clear_kvcache { if [[ ${password_sys} == "" ]];then obclient_sys="obclient -h${host} -P${port} -uroot@sys -Doceanbase -A -c" else obclient_sys="obclient -h${host} -P${port} -uroot@sys -Doceanbase -p${password_sys} -A -c" fi tenant_name=${user#*@} echo "alter system flush kvcache ;" | ${obclient_sys} echo "alter system flush kvcache tenant '${tenant_name}' cache 'user_row_cache';" | ${obclient_sys} sleep 3s } function do_explain { # Execution plan echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN EXPLAIN ALL TPCH PLAN" for i in {1..22} do sql_explain="source explain_mysql/${i}.sql" echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN EXPLAIN Q${i}:" echo ${sql_explain} | ${TPCH_TEST} | sed 's/\\n/\n/g' |tee explain_log/${i}.exp echo `date '+[%Y-%m-%d %H:%M:%S]'` "Q${i} END" done } function do_warmup { # Warm-up totaltime=0 for i in {1..22} do starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN prewarm Q${i}" sql1="source mysql_sql/${i}.sql" echo ${sql1}| ${TPCH_TEST} > mysql_log/${i}_prewarm.log || ret=1 stoptime=`date +%s%N` costtime=`echo ${stoptime} ${starttime} | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` first_array[$i]=$(echo "scale=2; ${first_array[$i]} + $costtime" | bc) echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" totaltime=`echo ${totaltime} ${costtime} | awk '{printf "%0.2f\n", ($1 + $2)}'` done echo "total cost:${totaltime}s" } function hot_run { # Actual execution for j in {1..10} do totaltime=0 for i in {1..22} do starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN BEST Q${i} (hot run)" sql1="source mysql_sql/${i}.sql" echo ${sql1}| ${TPCH_TEST} > mysql_log/${i}.log || ret=1 stoptime=`date +%s%N` costtime=`echo ${stoptime} ${starttime} | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` hot_array[$i]=$(echo "scale=2; ${hot_array[$i]} + $costtime" | bc) echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" totaltime=`echo ${totaltime} ${costtime} | awk '{printf "%0.2f\n", ($1 + $2)}'` done echo "total cost:${totaltime}s" done } function cold_run { # Actual execution for j in {1..3} do totaltime=0 for i in {1..22} do clear_kvcache starttime=`date +%s%N` echo `date '+[%Y-%m-%d %H:%M:%S]'` "BEGIN BEST Q${i} (cold run)" sql1="source mysql_sql/${i}.sql" echo $sql1| $TPCH_TEST > mysql_log/${i}_cold.log || ret=1 stoptime=`date +%s%N` costtime=`echo $stoptime $starttime | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'` cold_array[$i]=$(echo "scale=2; ${cold_array[$i]} + $costtime" | bc) echo `date '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s" totaltime=`echo ${totaltime} ${costtime} | awk '{printf "%0.2f\n", ($1 + $2)}'` done echo "total cost:${totaltime}s" done } do_explain do_warmup hot_run cold_runExecute the test script.
sh tpch.sh
FAQ
Data import fails with:
ERROR 1017 (HY000) at line 1: File not existPlace the
.tblfiles on the same server as the OceanBase instance you connect to. Data import must run locally on that server.Query fails with:
ERROR 4624 (HY000): No memory or reach tenant memory limitThe tenant is out of memory. Increase the tenant memory limit.
Data import fails with:
ERROR 1227 (42501) at line 1: Access deniedGrant the required privileges:
grant file on *.* to tpch_100g_part;
