This topic describes datasets provided to help you evaluate the performance of OceanBase Cloud.
TPC-H
TPC-H is a benchmark for testing online analytical processing (OLAP) database systems. It is used to evaluate the performance of a system in handling complex queries and analyzing data. TPC-H simulates a typical data warehouse application by using a set of predefined queries and concurrent data modification operations.
Data model: TPC-H defines a standardized data model that contains 22 analytical queries for 8 tables to simulate transactions between suppliers and buyers.
| Table | Description |
|---|---|
| REGION | Contains the region information. Each region has a unique code and name. |
| NATION | Contains the country information. Each country has a unique code and name, as well as the ID of the region where the country locates. |
| PART | Contains the part information, such as the ID, name, description, size, and type of the part. |
| SUPPLIER | Contains the supplier information, such as the ID, name, address, and country code of the supplier. |
| PARTSUPP | Contains the part-supplier relationship information, such as the supplier that provides a specific part, and the supplier ID, part ID, quantity, and price. |
| CUSTOMER | Contains the customer information, such as the ID, name, address, phone number, and niche market of the customer. |
| ORDERS | Contains the order information, such as the order ID, order date, shipping details, order status, and customer ID. |
| LINEITEM | Contains details about commodities on sale, such as order cost and shipping date. |
Queries: TPC-H includes 22 complex analytical SQL queries that involve a wide range of database operations, such as joining, grouping, sorting, and aggregation. Data size: OceanBase Cloud allows you to use 25 MB, 128 MB, or 1 GB of data volume for different test needs. The TPC-H model is a typical snowflake schema, which comprises eight tables. The NATION and REGION tables have fixed data volumes. The data volumes of the other six tables are correlated with the specified scale factor (SF).
TPC-DS
TPC-DS simulates common decision-making scenarios in decision support systems (DSSs). Compared to TPC-H, TPC-DS is more comprehensive and complex. It evaluates the performance of both queries and data maintenance operations such as data loading and deletion. TPC-DS adopts multi-dimensional schemas such as the star or snowflake schema. It contains 7 fact tables and 17 dimension tables. TPC-DS is quite close to real scenarios, and it is also difficult to run the test.
Data model: TPC-DS defines a more complex data model that contains 24 tables covering many fields, such as sales, inventory, and customer relationship.
| Table | Description |
|---|---|
| CALL_CENTER | Contains the call center information, such as the ID, name, and category of the call center. |
| CATALOG_PAGE | Contains catalog page details, such as page ID, type, and department. |
| CUSTOMER | Contains the customer information, such as the ID, name, and address of the customer. |
| CUSTOMER_ADDRESS | Contains customer address details, such as address ID, city, and postal code. |
| CUSTOMER_DEMOGRAPHICS | Contains customer demographics, such as education level and income level. |
| DATE_DIM | Contains date dimension information, such as date, holiday, and weekend. |
| HOUSEHOLD_DEMOGRAPHICS | Records household demographics, such as the number of family members and household income. |
| INCOME_BAND | Defines income ranges. |
| INVENTORY | Records the inventory of commodities. |
| ITEM | Contains the commodity information, such as the ID, name, and category of the commodity. |
| PROMOTION | Contains the promotion information, such as the ID, type, start date, and end date of the promotion event. |
| REASON | Records the reasons why customers returned commodities or requested a refund. |
| SHIP_MODE | Records shipping methods, such as express delivery and standard mail. |
| STORE | Contains the physical store information, such as the ID, name, and address of the store. |
| STORE_SALES | Contains sales records of physical stores, such as the transaction time and sales amount. |
| TIME_DIM | Contains the time dimension information, such as hour, minute, and second. |
| WAREHOUSE | Contains the warehouse information, such as the ID, name, and address of the warehouse. |
| WEB_PAGE | Contains the web page information, such as the ID, URL, and creation time of the web page. |
| WEB_SALES | Contains online sales records. |
| WEB_SITE | Contains the website information, such as the ID, name, and URL of the website. |
Queries and tasks TPC-DS includes 99 queries and several data maintenance tasks. These queries cover diverse scenarios from reporting to online analytical processing and data mining.
Data size: OceanBase Cloud allows you to use 25 MB, 128 MB, or 1 GB of data volume for different test needs.
Flight delay and cancellation dataset (from 2019 to 2023)
This dataset contains information about flight delays and cancellations from 2019 to 2023, including airlines, departure locations, arrival locations, delay durations, and cancellation reasons. It helps researchers and airlines analyze the causes and trends of flight delays and cancellations and then propose solutions to improve airline on-time performance and passenger satisfaction.
Employee dataset
This dataset simulates a database of employee information that can be used to demonstrate basic database operations. It contains six tables, including information about employees, departments, and salaries.
Open University learning analytics dataset
This dataset includes information about the engagement of students in online courses, including time spent on courses, academic performance, and interactions during courses. It helps educational institutions analyze online learning behaviors and trends and optimize course designs and teaching methods.