To facilitate the evaluation of OceanBase Cloud's performance, OceanBase Cloud provides these sample datasets:
TPC-H
TPC-H is a benchmark for online analytical processing (OLAP) database systems. It is primarily used to assess a system's capability to handle complex queries and perform data analysis. TPC-H employs a set of predefined queries and concurrent data modification operations to simulate a typical data warehouse application.

Data Model TPC-H defines a standardized data model that simulates the transactional behavior between suppliers and customers. TPC-H includes 22 analytical queries for 8 tables.
| Table Name | Description |
|---|---|
| REGION | Contains regional information, each region includes a unique region code and name. |
| NATION | Contains information about nations, each with a unique nation key, name, and the ID of the region to which it belongs. |
| PART | Describes the details of parts, including part ID, name, description, size, type, etc. |
| SUPPLIER | Contains supplier information, such as supplier ID, name, address, and nation key. |
| PARTSUPP | The part-supplier relationship table, records which supplier provides which part, including supplier ID, part ID, supply quantity, and price. |
| CUSTOMER | Describes customer information, including customer ID, name, address, phone number, and market segment. |
| ORDERS | Records order information, including order ID, order date, shipping details, order status, and customer ID. |
| LINEITEM | Describes individual items (parts and services) in orders, recording detailed information such as the cost of the order and delivery date. |
Queries TPC-H includes 22 complex transactional SQL queries that cover a wide range of database operations, such as join, grouping, sorting, aggregation, etc.
Data Scale OceanBase Cloud provides data sizes of 25MB, 128MB, and 1GB to accommodate testing needs for different data scales. The TPC-H model is a typical snowflake model that consists of 8 tables. Among these, the data size of the NATION and REGION tables is fixed, while the data size of the remaining 6 tables is proportional to the scale factor (SF). The data size of each table is determined based on the specified SF.
TPC-DS
TPC-DS simulates a variety of business decision scenarios common in decision support systems (DSS). Compared with TPC-H, TPC-DS tests are more comprehensive and complex. It not only evaluates query performance but also assesses the performance of data maintenance operations (such as data loading and deletion). TPC-DS uses multi-dimensional data models such as star and snowflake schemas. It consists of 7 fact tables and 17 dimension tables. TPC-DS is a benchmark that closely resembles real-world scenarios and is considered a challenging benchmark.
Data Model TPC-DS defines a more complex data model that contains 24 tables, covering multiple business domains such as sales, inventory, and customer relations.
| Table Name | Description |
|---|---|
| CALL_CENTER | Contains information about the call center, such as call center ID, name, and class. |
| CATALOG_PAGE | Details about the catalog page, including page ID, type, department, etc. |
| CUSTOMER | Describes customer information, including customer ID, name, address, etc. |
| CUSTOMER_ADDRESS | Detailed information on customer addresses, including address ID, city, and postal code. |
| CUSTOMER_DEMOGRAPHICS | Demographics of customers, such as education level and income bracket. |
| DATE_DIM | Date dimension table, providing information such as date, holiday, and weekend. |
| HOUSEHOLD_DEMOGRAPHICS | Household demographic data, detailing the number of household members and the household income. |
| INCOME_BAND | Income band table defining various ranges of income. |
| INVENTORY | Inventory table records the stock levels of products. |
| ITEM | Product information table, including item ID, name, and category. |
| PROMOTION | Information on promotional activities, including promotion ID, promotion type, and start and end dates. |
| REASON | Reasons for customer returns or refunds. |
| SHIP_MODE | Shipping modes, such as express and standard mail. |
| STORE | Physical store information, including store ID, name, and address. |
| STORE_SALES | Physical store sales records, including sales time and sales amount. |
| TIME_DIM | Time dimension table, documenting time information such as hours, minutes, and seconds. |
| WAREHOUSE | Warehouse information, including warehouse ID, warehouse name, and address. |
| WEB_PAGE | Web page information, including web page ID, URL, and creation time. |
| WEB_SALES | Web sales records, detailing information about online sales. |
| WEB_SITE | Website information, including site ID, name, and URL. |
Queries and Tasks TPC-DS includes 99 queries and some data maintenance tasks, with queries covering a variety of application scenarios such as reporting, online analytical processing, and data mining.
Data Scale Like TPC-H, TPC-DS supports testing data sets of various sizes. OceanBase Cloud provides data volumes of 25MB, 128MB, and 1GB to fit different sizes of testing requirements. These datasets allow users to conduct analyses and evaluations on the performance of OceanBase Cloud under various data warehousing and decision support scenarios.
Both TPC-H and TPC-DS are essential for businesses aiming to optimize their database management and analytics capabilities. By leveraging these datasets, users can gain insights into the scalability, speed, and efficiency of OceanBase Cloud, making data-driven decisions to enhance their operational workflows.
Flight delays and cancellations dataset
This dataset contains flight delay and cancellation information from 2019 to 2023, including airline, origin, destination, delay times, cancellation reasons, and more. It helps researchers and airlines analyze the causes and trends of flight delays and cancellations to improve on-time performance and passenger satisfaction.
Employee dataset
The Employee dataset is a simulated employee information database used to demonstrate basic database operations. This dataset contains 6 tables with employee information, department details, salary data, and more.
Open university learning analytics dataset
The Open University Learning Analytics dataset contains various information on student engagement in online courses, including the time students spend in courses, their academic performance, and their interactive behaviors in the courses. This dataset can be used to study online learning behaviors and trends, and help educational institutions optimize course design and teaching methods.