Try out operational OLAP|V4.1.0| docs|Distributed Database

Try out operational OLAP

Last Updated：2023-08-28 09:28:41 Updated

OceanBase Database is suitable for hybrid transaction/analytical processing (HTAP) scenarios. OceanBase Database adopts a distributed architecture based on peer nodes. This architecture allows OceanBase Database to handle high-concurrency and scalable online transaction processing (OLTP) tasks and perform parallel computing for online analytical processing (OLAP) tasks based on the massively parallel processing (MPP) architecture in the same data engine, without maintaining two sets of data.

OceanBase Database not only allows you to analyze a large amount of online business data in parallel but also allows you to perform parallel DML (PDML) operations to quickly and securely execute large transactions that concurrently write data in batches. All these are achieved without compromising transaction consistency.

The following describes how to manually run the Transaction Processing Performance Council Benchmark H (TPC-H) benchmark test to show the characteristics of OceanBase Database in operational OLAP scenarios. TPC-H is a commonly used benchmark that measures the analysis and decision support capabilities of database systems by using a series of complex queries on massive amounts of data. For more information, visit the official website of TPC. On May 20, 2021, OceanBase Database set a new world record in running the TPC-H benchmark with a result of 15.26 million QphH. It is by far the only database that achieved top results in running both the TPC-C and TPC-H benchmarks, which testifies its HTAP capabilities in both online transactions and real-time analysis. For more information, see TPC-H Results.

Manually run the TPC-H benchmark test

Use the official TPC-H tools from TPC to manually run the TPC-H benchmark test step by step. Manual testing can help you better understand OceanBase Database, especially the settings of some parameters.

Optimize the test environment

Perform the following steps in the sys tenant:

Optimize OceanBase Database parameters.

Run the following command in the sys tenant: obclient -h$host_ip -P$host_port -uroot@sys -A.

# Adjust the memory usage of the sys tenant to provide more resources for the test tenant. You need to adjust the memory usage based on the actual environment.
ALTER SYSTEM SET system_memory='15g';
ALTER RESOURCE UNIT sys_unit_config memory_size ='15G';
# Optimize the parameters.
ALTER SYSTEM SET trace_log_slow_query_watermark='100s';
ALTER SYSTEM SET enable_sql_operator_dump=True;
ALTER SYSTEM SET _hash_area_size='3g';
ALTER SYSTEM SET memstore_limit_percentage=50;
ALTER SYSTEM SET enable_rebalance=False;
ALTER SYSTEM SET memory_chunk_cache_size='0';
ALTER SYSTEM SET major_compact_trigger=5;
ALTER SYSTEM SET cache_wash_threshold='30g';
# Adjust the log level and the maximum number of log files that can be saved.
ALTER SYSTEM SET syslog_level='PERF';
ALTER SYSTEM SET max_syslog_file_count=100;
ALTER SYSTEM SET enable_syslog_recycle='True';

Set tenant parameters.

Run the following command in the test user: obclient -h$host_ip -P$host_port -u$user@$tenant -p****** -A.

# Set global parameters.
SET GLOBAL ob_sql_work_area_percentage=80;
SET GLOBAL optimizer_use_sql_plan_baselines = true;
SET GLOBAL optimizer_capture_sql_plan_baselines = true;
ALTER SYSTEM SET ob_enable_batched_multi_statement='true';
# Set relevant tenant parameters to prevent transaction timeout.
SET GLOBAL ob_query_timeout=36000000000;
SET GLOBAL ob_trx_timeout=36000000000;
SET GLOBAL max_allowed_packet=67108864;
SET GLOBAL secure_file_priv="";
/*
We recommend that you set 'parallel_server_target' to the result of the following formula: Number of CPU cores of the resource units allocated to the test tenant × 10 × Number of OBServer nodes × 0.8.
*/
SET GLOBAL parallel_servers_target=624;

Restart the cluster.

Install the TPC-H tools

Download the TPC-H tools from the TPC-H Tools Download page.

Unzip the package and go to the TPC-H directory.

[wieck@localhost ~] $ unzip 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip
[wieck@localhost ~] $ cd TPC-H_Tools_v3.0.0

Copy the Makefile.suite file.

[wieck@localhost TPC-H_Tools_v3.0.0] $ cd dbgen/
[wieck@localhost dbgen] $ cp Makefile.suite Makefile

Redefine the CC, DATABASE, MACHINE, and WORKLOAD parameters in the Makefile.suite file.

CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE= MYSQL
MACHINE = LINUX
WORKLOAD = TPCH

Modify the tpcd.h file and add new macro definitions.

#ifdef MYSQL
#define GEN_QUERY_PLAN ""
#define START_TRAN "START TRANSACTION"
#define END_TRAN "COMMIT"
#define SET_OUTPUT ""
#define SET_ROWCOUNT "limit %d;\n"
#define SET_DBASE "use %s;\n"
#endif

Compile the files.
```
make
```

Generate data

Generate 10 GB, 100 GB, or 1 TB of data as needed. The following example shows the command to generate 100 GB of data.

./dbgen -s 100
mkdir tpch100
mv *.tbl tpch100

Generate SQL query statements

Note

You can perform the following steps to generate an SQL query statement and then adjust it, or you can use the SQL query statement provided in GitHub.

However, before you use the SQL query statement provided in GitHub, change the value of the cpu_num parameter in the statement to the actual number of concurrent threads.

Copy the qgen and dists.dss files to the queries directory.
```
cp qgen queries
cp dists.dss queries
```
In the queries directory, create a gen.sh script to generate SQL query statements.
```
#!/usr/bin/bash
for i in {1..22}
do  
./qgen -d $i -s 100 > db"$i".sql
done
```
Run the gen.sh script.
```
chmod +x  gen.sh
./gen.sh
```
Modify the SQL query statements.
```
dos2unix *
```

For the modified SQL query statements, see GitHub. You need to change the value of the cpu_num parameter in the SQL statements to the actual number of concurrent threads. To ensure the best performance, we recommend that you set the value to the total number of available CPU cores.

Run the following command under the sys tenant to view the total number of available CPU cores for a tenant:

select sum(max_cpu) from DBA_OB_UNITS ;

Take q1 as an example. The modified SQL statement is as follows:

select /*+    parallel(96) */   --- Specify the execution concurrency.
   l_returnflag,
   l_linestatus,
   sum(l_quantity) as sum_qty,
   sum(l_extendedprice) as sum_base_price,
   sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
   sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
   avg(l_quantity) as avg_qty,
   avg(l_extendedprice) as avg_price,
   avg(l_discount) as avg_disc,
   count(*) as count_order
from
   lineitem
where
   l_shipdate <= date '1998-12-01' - interval '90' day
group by
   l_returnflag,
   l_linestatus
order by
   l_returnflag,
   l_linestatus;

Create a table

Create a schema file named create_tpch_mysql_table_part.ddl.

create tablegroup if not exists tpch_tg_100g_lineitem_order_group binding true partition by key 1 partitions 192;
create tablegroup if not exists tpch_tg_100g_partsupp_part binding true partition by key 1 partitions 192;

drop database if exists $db_name;
create database $db_name;
use $db_name;

    create table lineitem (
       l_orderkey bigint not null,
       l_partkey bigint not null,
       l_suppkey bigint not null,
       l_linenumber bigint not null,
       l_quantity bigint not null,
       l_extendedprice bigint not null,
       l_discount bigint not null,
       l_tax bigint not null,
       l_returnflag char(1) default null,
       l_linestatus char(1) default null,
       l_shipdate date not null,
       l_commitdate date default null,
       l_receiptdate date default null,
       l_shipinstruct char(25) default null,
       l_shipmode char(10) default null,
       l_comment varchar(44) default null,
       primary key(l_orderkey, l_linenumber))
       tablegroup = tpch_tg_100g_lineitem_order_group
       partition by key (l_orderkey) partitions 192;
    create index I_L_ORDERKEY on lineitem(l_orderkey) local;
    create index I_L_SHIPDATE on lineitem(l_shipdate) local;

    create table orders (
       o_orderkey bigint not null,
       o_custkey bigint not null,
       o_orderstatus char(1) default null,
       o_totalprice bigint default null,
       o_orderdate date not null,
       o_orderpriority char(15) default null,
       o_clerk char(15) default null,
       o_shippriority bigint default null,
       o_comment varchar(79) default null,
       primary key (o_orderkey))
       tablegroup = tpch_tg_100g_lineitem_order_group
       partition by key(o_orderkey) partitions 192;
    create index I_O_ORDERDATE on orders(o_orderdate) local;


    create table partsupp (
       ps_partkey bigint not null,
       ps_suppkey bigint not null,
       ps_availqty bigint default null,
       ps_supplycost bigint default null,
       ps_comment varchar(199) default null,
       primary key (ps_partkey, ps_suppkey))
       tablegroup tpch_tg_100g_partsupp_part
       partition by key(ps_partkey) partitions 192;


    create table part (
       p_partkey bigint not null,
       p_name varchar(55) default null,
       p_mfgr char(25) default null,
       p_brand char(10) default null,
       p_type varchar(25) default null,
       p_size bigint default null,
       p_container char(10) default null,
       p_retailprice bigint default null,
       p_comment varchar(23) default null,
       primary key (p_partkey))
       tablegroup tpch_tg_100g_partsupp_part
       partition by key(p_partkey) partitions 192;


   create table customer (
       c_custkey bigint not null,
       c_name varchar(25) default null,
       c_address varchar(40) default null,
       c_nationkey bigint default null,
       c_phone char(15) default null,
       c_acctbal bigint default null,
       c_mktsegment char(10) default null,
       c_comment varchar(117) default null,
       primary key (c_custkey))
       partition by key(c_custkey) partitions 192;

   create table supplier (
       s_suppkey bigint not null,
       s_name char(25) default null,
       s_address varchar(40) default null,
       s_nationkey bigint default null,
       s_phone char(15) default null,
       s_acctbal bigint default null,
       s_comment varchar(101) default null,
       primary key (s_suppkey)
     ) partition by key(s_suppkey) partitions 192;


   create table nation (
       n_nationkey bigint not null,
       n_name char(25) default null,
       n_regionkey bigint default null,
       n_comment varchar(152) default null,
       primary key (n_nationkey));

   create table region (
       r_regionkey bigint not null,
       r_name char(25) default null,
       r_comment varchar(152) default null,
       primary key (r_regionkey));

Load data

Write your own script based on the data and SQL query statements generated from the preceding steps. The following is an example of how to load data.

Create a directory for loading scripts.

mkdir load
cd load
cp ../dss.ri  ../dss.ddl ./

Create the load.py script.

$cat load.py
#/usr/bin/evn python
#-*- encoding:utf-8 -*-
import os
import sys
import time
import commands
hostname='$host_ip'  # Attention! Please fill in the IP address of an OBServer node, such as OBServer A.
port='$host_port'               # Port number of OBServer A
tenant='$tenant_name'              # Tenant name
user='$user'               # Username
password='$password'           # Password
data_path='$path'         # Attention! Please fill in the directory where tbl is located on OBServer A.
db_name='$db_name'             # Database name
# Create a table
cmd_str='obclient -h%s -P%s -u%s@%s -p%s -D%s < create_tpch_mysql_table_part.ddl'%(hostname,port,user,tenant,password,db_name)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str='obclient -h%s -P%s -u%s@%s -p%s  -D%s -e "show tables;" '%(hostname,port,user,tenant,password,db_name)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c  -D%s -e "load data /*+ parallel(80) */ infile '%s/customer.tbl' into table customer fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c  -D%s -e "load data /*+ parallel(80) */ infile '%s/lineitem.tbl' into table lineitem fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c -D%s -e "load data /*+ parallel(80) */ infile '%s/nation.tbl' into table nation fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c  -D%s -e "load data /*+ parallel(80) */ infile '%s/orders.tbl' into table orders fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s   -D%s -e "load data /*+ parallel(80) */ infile '%s/partsupp.tbl' into table partsupp fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c  -D%s -e "load data /*+ parallel(80) */ infile '%s/part.tbl' into table part fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c  -D%s -e "load data /*+ parallel(80) */ infile '%s/region.tbl' into table region fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" obclient -h%s -P%s -u%s@%s -p%s -c  -D%s -e "load data /*+ parallel(80) */ infile '%s/supplier.tbl' into table supplier fields terminated by '|';" """ %(hostname,port,user,tenant,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result

Load the data.

Notice

To load data, you need to install the OBClient.

$ python load.py
(0,'')
(0, 'obclient: [Warning] Using a password on the command line interface can be insecure.\nTABLE_NAME\nT1\nLINEITEM\nORDERS\nPARTSUPP\nPART\nCUSTOMER\nSUPPLIER\nNATION\nREGION')
(0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')
(0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')
(0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')
(0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')
(0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')
(0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')
(0, 'obclient: [Warning] Using a password on the command line interface can be insecure.')

Perform a major compaction.

A major compaction combines the current major version's SSTables and MemTables with the full static data of the previous major version and generates a new set of full data. This makes the statistics of the storage layer more accurate and improves the stability of execution plans.

Note

To perform a major compaction, you must log on to the sys tenant of the OceanBase cluster as the root user.
```
MySQL [(none)]> USE oceanbase
Database changed
MySQL [oceanbase]> ALTER SYSTEM MAJOR FREEZE;
Query OK, 0 rows affected
```

Check whether the major compaction is complete.

MySQL [oceanbase]> SELECT name,value FROM oceanbase.__all_zone WHERE name='frozen_version' OR name='last_merged_version';
+---------------------+-------+
| name                | value |
+---------------------+-------+
| frozen_version      |     2 |
| last_merged_version |     2 |
| last_merged_version |     2 |
| last_merged_version |     2 |
| last_merged_version |     2 |
+---------------------+-------+

Note

The major compaction is considered complete when the value of frozen_version is equal to that of last_merged_version for all zones.

Run the test

Write your own script based on the data and SQL query statements generated from the preceding steps. The following is an example of how to run the test.

Write the tpch.sh script in the queries directory.

#!/bin/bash
TPCH_TEST="obclient -h $host_ip -P $host_port -utpch_100g_part@tpch_mysql  -D tpch_100g_part  -ptest -c"
# Run a warmup.
for i in {1..22}
do
   sql1="source db${i}.sql"
   echo $sql1| $TPCH_TEST >db${i}.log  || ret=1
done
# Run the formal test.
for i in {1..22}
do
   starttime=`date +%s%N`
   echo `date  '+[%Y-%m-%d %H:%M:%S]'` "BEGIN Q${i}"
   sql1="source db${i}.sql"
   echo $sql1| $TPCH_TEST >db${i}.log  || ret=1
   stoptime=`date +%s%N`
   costtime=`echo $stoptime $starttime | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'`
   echo `date  '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s"
done

Run the test script.
```
sh tpch.sh
```

FAQ

Q: What do I do when an error occurred while importing data? Error message:
```
ERROR 1017 (HY000) at line 1: File not exist
```
A: The tbl file must be placed in a directory on the server that hosts OceanBase Database, because only local data can be imported and loaded.
Q: What do I do when an error occurred while viewing data? Error message:
```
ERROR 4624 (HY000): No memory or reach tenant memory limit
```
A: You are running out of memory. Allocate more memory to the tenant.
Q: What do I do when an error occurred while importing data. Error message:
```
ERROR 1227 (42501) at line 1: Access denied
```
A: Grant the access privilege to the logon user by running the following command:
```
grant file on *.* to tpch_100g_part;
```
Q: What do I do when an error occurred while adjusting dos2unix * for the SQL query? Error message:
```
-bash: dos2unix: command not found
```
A: Install dos2unix by running the following command:
```
yum install -y dos2unix
```

Try out operational OLAP

Through the previous operation, you have obtained a TPC-H test environment. Now, try out the operational OLAP capability of OceanBase Database. First, log on to your database through the OBClient or a MySQL client.

obclient -h127.0.0.1 -P2881 -uroot@test -Dtest -A -p -c

Before you start, set the degree of parallelism (DOP) based on the configurations of the OceanBase cluster and the tenant. We recommend that you set the DOP to be no more than twice the number of CPU cores of your tenant. For example, if your tenant has a maximum of 8 CPU cores, set the DOP to 16:

MySQL [test]> SET GLOBAL parallel_servers_target=16;
Query OK, 0 rows affected

MySQL [test]> SET GLOBAL parallel_max_servers=16;
Query OK, 0 rows affected

OceanBase Database is compatible with most internal views of MySQL databases. Run the following command to query the sizes of tables in the current environment:

MySQL [test]> SELECT table_name, table_rows, CONCAT(ROUND(data_length/(1024*1024*1024),2),' GB') table_size FROM information_schema.TABLES WHERE table_schema = 'test' order by table_rows desc;
+------------+------------+------------+
| table_name | table_rows | table_size |
+------------+------------+------------+
| lineitem   |    6001215 | 0.37 GB    |
| orders     |    1500000 | 0.08 GB    |
| partsupp   |     800000 | 0.04 GB    |
| part       |     200000 | 0.01 GB    |
| customer   |     150000 | 0.01 GB    |
| supplier   |      10000 | 0.00 GB    |
| nation     |         25 | 0.00 GB    |
| region     |          5 | 0.00 GB    |
+------------+------------+------------+
8 rows in set

Run the Q1 query to verify the query capability of OceanBase Database. Q1 query will summarize and analyze the prices, discounts, shipments, and quantities of various products within a specified period of time on the largest table named lineitem. This query will read all the data in the entire table and perform partitioning, sorting, and aggregation.

Execute the query without enabling parallel queries (default)

Run the following command:

select
 l_returnflag,
 l_linestatus,
 sum(l_quantity) as sum_qty,
 sum(l_extendedprice) as sum_base_price,
 sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
 sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
 avg(l_quantity) as avg_qty,
 avg(l_extendedprice) as avg_price,
 avg(l_discount) as avg_disc,
 count(*) as count_order
from
 lineitem
where
 l_shipdate <= date '1998-12-01' - interval '90' day
group by
 l_returnflag,
 l_linestatus
order by
 l_returnflag,
 l_linestatus;

The execution result is as follows:

+--------------+--------------+----------+----------------+----------------+--------------+---------+------------+----------+-------------+
| l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price | sum_charge   | avg_qty | avg_price  | avg_disc | count_order |
+--------------+--------------+----------+----------------+----------------+--------------+---------+------------+----------+-------------+
| A            | F            | 37734107 |    56586577106 |    56586577106 |  56586577106 | 25.5220 | 38273.1451 |   0.0000 |     1478493 |
| N            | F            |   991417 |     1487505208 |     1487505208 |   1487505208 | 25.5165 | 38284.4806 |   0.0000 |       38854 |
| N            | O            | 74476040 |   111701776272 |   111701776272 | 111701776272 | 25.5022 | 38249.1339 |   0.0000 |     2920374 |
| R            | F            | 37719753 |    56568064200 |    56568064200 |  56568064200 | 25.5058 | 38250.8701 |   0.0000 |     1478870 |
+--------------+--------------+----------+----------------+----------------+--------------+---------+------------+----------+-------------+
4 rows in set (6.791 sec)

Execute the query while enabling parallel queries

The operational OLAP capability of OceanBase Database is based on a set of data and the execution engine, without the need to synchronize or maintain heterogeneous data. Add a parallel hint to the query statement to set the DOP to 8 and execute the statement again:

select /*+parallel(8) */
 l_returnflag,
 l_linestatus,
 sum(l_quantity) as sum_qty,
 sum(l_extendedprice) as sum_base_price,
 sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
 sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
 avg(l_quantity) as avg_qty,
 avg(l_extendedprice) as avg_price,
 avg(l_discount) as avg_disc,
 count(*) as count_order
from
 lineitem
where
 l_shipdate <= date '1998-12-01' - interval '90' day
group by
 l_returnflag,
 l_linestatus
order by
 l_returnflag,
 l_linestatus;

In the same environment and datasets, the execution result is as follows:

+--------------+--------------+----------+----------------+----------------+--------------+---------+------------+----------+-------------+
| l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price | sum_charge   | avg_qty | avg_price  | avg_disc | count_order |
+--------------+--------------+----------+----------------+----------------+--------------+---------+------------+----------+-------------+
| A            | F            | 37734107 |    56586577106 |    56586577106 |  56586577106 | 25.5220 | 38273.1451 |   0.0000 |     1478493 |
| N            | F            |   991417 |     1487505208 |     1487505208 |   1487505208 | 25.5165 | 38284.4806 |   0.0000 |       38854 |
| N            | O            | 74476040 |   111701776272 |   111701776272 | 111701776272 | 25.5022 | 38249.1339 |   0.0000 |     2920374 |
| R            | F            | 37719753 |    56568064200 |    56568064200 |  56568064200 | 25.5058 | 38250.8701 |   0.0000 |     1478870 |
+--------------+--------------+----------+----------------+----------------+--------------+---------+------------+----------+-------------+
4 rows in set (1.197 sec)

As you can see, the query speed is increased by nearly 6 times with parallel queries. Run the EXPLAIN command to view the execution plan, which contains the DOP (line 18, operator 1, dop=8):

===============================================================
|ID|OPERATOR                      |NAME    |EST. ROWS|COST    |
---------------------------------------------------------------
|0 |PX COORDINATOR MERGE SORT     |        |6        |13507125|
|1 | EXCHANGE OUT DISTR           |:EX10001|6        |13507124|
|2 |  SORT                        |        |6        |13507124|
|3 |   HASH GROUP BY              |        |6        |13507107|
|4 |    EXCHANGE IN DISTR         |        |6        |8379337 |
|5 |     EXCHANGE OUT DISTR (HASH)|:EX10000|6        |8379335 |
|6 |      HASH GROUP BY           |        |6        |8379335 |
|7 |       PX BLOCK ITERATOR      |        |5939712  |3251565 |
|8 |        TABLE SCAN            |lineitem|5939712  |3251565 |
===============================================================

Outputs & filters:
-------------------------------------
  0 - output([lineitem.l_returnflag], [lineitem.l_linestatus], [T_FUN_SUM(T_FUN_SUM(lineitem.l_quantity))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_quantity)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_quantity)), DECIMAL(20, 0))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_extendedprice)), DECIMAL(20, 0))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_discount)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_discount)), DECIMAL(20, 0))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil), sort_keys([lineitem.l_returnflag, ASC], [lineitem.l_linestatus, ASC])
  1 - output([lineitem.l_returnflag], [lineitem.l_linestatus], [T_FUN_SUM(T_FUN_SUM(lineitem.l_quantity))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_quantity)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_quantity)), DECIMAL(20, 0))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_extendedprice)), DECIMAL(20, 0))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_discount)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_discount)), DECIMAL(20, 0))]), filter(nil), dop=8
  2 - output([lineitem.l_returnflag], [lineitem.l_linestatus], [T_FUN_SUM(T_FUN_SUM(lineitem.l_quantity))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_quantity)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_quantity)), DECIMAL(20, 0))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_extendedprice)), DECIMAL(20, 0))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_discount)) / cast(T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_discount)), DECIMAL(20, 0))]), filter(nil), sort_keys([lineitem.l_returnflag, ASC], [lineitem.l_linestatus, ASC])
  3 - output([lineitem.l_returnflag], [lineitem.l_linestatus], [T_FUN_SUM(T_FUN_SUM(lineitem.l_quantity))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax))], [T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_quantity))], [T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_extendedprice))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_discount))], [T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_discount))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil),
      group([lineitem.l_returnflag], [lineitem.l_linestatus]), agg_func([T_FUN_SUM(T_FUN_SUM(lineitem.l_quantity))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_quantity))], [T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_extendedprice))], [T_FUN_SUM(T_FUN_SUM(lineitem.l_discount))], [T_FUN_COUNT_SUM(T_FUN_COUNT(lineitem.l_discount))])
  4 - output([lineitem.l_returnflag], [lineitem.l_linestatus], [T_FUN_SUM(lineitem.l_quantity)], [T_FUN_SUM(lineitem.l_extendedprice)], [T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount)], [T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax)], [T_FUN_COUNT(lineitem.l_quantity)], [T_FUN_COUNT(lineitem.l_extendedprice)], [T_FUN_SUM(lineitem.l_discount)], [T_FUN_COUNT(lineitem.l_discount)], [T_FUN_COUNT(*)]), filter(nil)
  5 - (#keys=2, [lineitem.l_returnflag], [lineitem.l_linestatus]), output([lineitem.l_returnflag], [lineitem.l_linestatus], [T_FUN_SUM(lineitem.l_quantity)], [T_FUN_SUM(lineitem.l_extendedprice)], [T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount)], [T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax)], [T_FUN_COUNT(lineitem.l_quantity)], [T_FUN_COUNT(lineitem.l_extendedprice)], [T_FUN_SUM(lineitem.l_discount)], [T_FUN_COUNT(lineitem.l_discount)], [T_FUN_COUNT(*)]), filter(nil), dop=8
  6 - output([lineitem.l_returnflag], [lineitem.l_linestatus], [T_FUN_SUM(lineitem.l_quantity)], [T_FUN_SUM(lineitem.l_extendedprice)], [T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount)], [T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax)], [T_FUN_COUNT(lineitem.l_quantity)], [T_FUN_COUNT(lineitem.l_extendedprice)], [T_FUN_SUM(lineitem.l_discount)], [T_FUN_COUNT(lineitem.l_discount)], [T_FUN_COUNT(*)]), filter(nil),
      group([lineitem.l_returnflag], [lineitem.l_linestatus]), agg_func([T_FUN_SUM(lineitem.l_quantity)], [T_FUN_SUM(lineitem.l_extendedprice)], [T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount)], [T_FUN_SUM(lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax)], [T_FUN_COUNT(*)], [T_FUN_COUNT(lineitem.l_quantity)], [T_FUN_COUNT(lineitem.l_extendedprice)], [T_FUN_SUM(lineitem.l_discount)], [T_FUN_COUNT(lineitem.l_discount)])
  7 - output([lineitem.l_returnflag], [lineitem.l_linestatus], [lineitem.l_quantity], [lineitem.l_extendedprice], [lineitem.l_discount], [lineitem.l_extendedprice * 1 - lineitem.l_discount], [lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax]), filter(nil)
  8 - output([lineitem.l_returnflag], [lineitem.l_linestatus], [lineitem.l_quantity], [lineitem.l_extendedprice], [lineitem.l_discount], [lineitem.l_extendedprice * 1 - lineitem.l_discount], [lineitem.l_extendedprice * 1 - lineitem.l_discount * 1 + lineitem.l_tax]), filter([lineitem.l_shipdate <= ?]),
      access([lineitem.l_shipdate], [lineitem.l_returnflag], [lineitem.l_linestatus], [lineitem.l_quantity], [lineitem.l_extendedprice], [lineitem.l_discount], [lineitem.l_tax]), partitions(p[0-15])

In this example, OceanBase Database is deployed on a single server. However, the most prominent feature of the parallel execution framework of OceanBase Database lies in that it can concurrently execute analytical queries on large amounts of data on multiple servers. For example, if a table has hundreds of millions of data rows across multiple OBServer nodes, the distributed execution framework generates a distributed parallel execution plan and uses the resources of these nodes for analytical queries. Therefore, OceanBase Database has high scalability. In addition, you can set the DOP in multiple dimensions, such as SQL statement, session, and table.

Use OBD to run the TPC-H benchmark test

Applicability

This section applies only to OceanBase Database Community Edition. OceanBase Database Enterprise Edition does not support OceanBase Deployer (OBD).

Apart from using the official TPC-H tools from TPC, you can also use OBD to run the TPC-H benchmark test. Before you start, install the obtpch component on the server where OceanBase Database and OBD are deployed:

sudo yum install obtpch

Then, run the following command to start the TPC-H benchmark test, with a dataset size of 1 GB. The entire process includes dataset generation, schema import, and automatic test execution. In this example, it is assumed that your test environment is deployed in the same way as that in Get started with OceanBase Database. You can modify the cluster name, password, installation directory, etc. to adapt to your situation.

Notice

Make sure that your disk space is sufficient to store the dataset files, so as to avoid filling up the space and causing system errors.

In this example, the /tmp directory is used to store dataset files.

cd /tmp
obd test tpch obtest --tenant=test -s 1 --password='******' --remote-tbl-dir=/tmp/tpch1

After the preceding command is executed, OBD starts to run the benchmark test, and you can see each step of the test process.

obd

sql

After the data import is completed, OBD automatically executes 22 SQL statements and prints the execution time for each SQL statement as well as the total execution time.