In database system design and optimization, data table design is a crucial task. A well-designed table structure not only ensures data integrity and consistency but also significantly improves query performance, storage efficiency, and system scalability. As data volumes grow and query patterns become more complex, how to scientifically design tables, choose appropriate storage structures, and implement effective indexing strategies has become the core of database optimization.
This topic introduces best practices for data table design, covering aspects such as table storage format selection, columnar storage for tables without a primary key, partitioning design, primary key design, and index usage in HTAP scenarios.
Choose the storage format of a table
OceanBase Database supports columnar and hybrid storage formats for tables. You can choose an appropriate storage format based on your business scenarios.
- Columnar storage: This storage format is suitable for OLAP scenarios. If your business is purely OLAP, you can choose columnar storage.
- Hybrid storage: This storage format is suitable for HTAP scenarios. If your business is HTAP, which contains both OLAP and OLTP scenarios, you can choose hybrid storage to ensure the performance of point queries in addition to the performance of analysis queries.
You can set the default storage format of tables in a tenant by using the default_table_store_format parameter.
-- Set the default storage format of tables in the tenant to columnar storage.
ALTER SYSTEM SET default_table_store_format = "column";
-- Set the default storage format of tables in the tenant to hybrid storage.
ALTER SYSTEM SET default_table_store_format = "compound";
Design a primary key
Designing a primary key is a core aspect of table design in relational databases. In OceanBase Database, a primary key not only ensures data uniqueness but also optimizes performance for range queries by sorting data based on the primary key. Different primary key designs are suitable for various business scenarios.
Scenarios without a primary key
Use a columnar table without a primary key
In data warehouse or online analytical processing (OLAP) scenarios, business data tables often contain a massive amount of data, and most queries only involve specific columns. If there are no special requirements for data uniqueness, avoiding duplicate writes, or transactional updates and inserts, creating a columnar table without a primary key is more efficient:
- Data import: In AP scenarios, data does not need to be sorted by the primary key. This results in faster data import performance compared to tables with a primary key.
- Reduced management costs: Managing primary keys can be resource-intensive, so eliminating them can reduce overhead.
show parameters like "%store_format%";
CREATE TABLE customer (
user_id bigint NOT NULL,
login_time timestamp NOT NULL,
customer_name varchar(100) NOT NULL, -- Assuming the customer name is up to 100 characters
phone_num bigint NOT NULL, -- Storing the phone number as a BIGINT
city_name varchar(50) NOT NULL, -- Assuming the city name is up to 50 characters
sex int NOT NULL, -- Storing gender as an INT (e.g., 0 for female, 1 for male)
id_number varchar(18) NOT NULL, -- Assuming the ID number is up to 18 characters
home_address varchar(255) NOT NULL, -- Assuming the home address is up to 255 characters
office_address varchar(255) NOT NULL, -- Assuming the office address is up to 255 characters
age int NOT NULL -- Storing age as an INT
);
Scenarios with a primary key
In AP scenarios, the default table organization is heap tables, characterized by:
Data storage: Data is stored in the order of insertion without physical sorting.
Primary key role:
- Ensures data uniqueness by including fields that uniquely identify records (e.g., user_id).
- Supports partition key embedding, where the primary key must include all partition keys (e.g., user_id and age) to ensure even data distribution.
- Requires indexing or partitioning strategies for query acceleration (e.g.,
ORDER BY).
Example: Designing a heap-organized table with a primary key
CREATE TABLE customer (
user_id BIGINT NOT NULL,
login_time TIMESTAMP NOT NULL,
customer_name VARCHAR(100) NOT NULL,
phone_num BIGINT NOT NULL,
city_name VARCHAR(50) NOT NULL,
sex INT NOT NULL,
id_number VARCHAR(18) NOT NULL,
home_address VARCHAR(255) NOT NULL,
office_address VARCHAR(255) NOT NULL,
age INT NOT NULL,
-- The primary key includes all partition keys (user_id and age)
PRIMARY KEY (user_id, age, login_time)
)
-- Main partitioning: Hash distributed by user_id
PARTITION BY HASH(user_id)
PARTITIONS 128
SUBPARTITION BY RANGE(age)
SUBPARTITION TEMPLATE (
-- Example partitions: Divided by age groups
SUBPARTITION p_youth VALUES LESS THAN (25), -- Age <25
SUBPARTITION p_adult VALUES LESS THAN (40), -- 25 ≤ age <40
SUBPARTITION p_middle_aged VALUES LESS THAN (60),-- 40 ≤ age <60
SUBPARTITION p_senior VALUES LESS THAN (MAXVALUE) -- Age 60 and above
);
Scenarios with a primary key
- Ensure data uniqueness: When data uniqueness must be guaranteed for each record in the table, a primary key must be designed.
- Improve query efficiency: A primary key helps the optimizer generate more efficient query plans, especially when queries rely on the primary key. It is recommended to include fields that are always used in queries as primary key fields to leverage the primary key for performance improvements.
CREATE TABLE customer (
user_id BIGINT NOT NULL,
login_time TIMESTAMP NOT NULL,
customer_name VARCHAR(100) NOT NULL,
phone_num BIGINT NOT NULL,
city_name VARCHAR(50) NOT NULL,
sex INT NOT NULL,
id_number VARCHAR(18) NOT NULL,
home_address VARCHAR(255) NOT NULL,
office_address VARCHAR(255) NOT NULL,
age INT NOT NULL,
-- The primary key includes all partition keys (user_id and age)
PRIMARY KEY (user_id, age, login_time)
)
-- Main partitioning: Hash distributed by user_id
PARTITION BY HASH(user_id)
PARTITIONS 128
SUBPARTITION BY RANGE(age)
SUBPARTITION TEMPLATE (
-- Example partitions: Divided by age groups
SUBPARTITION p_youth VALUES LESS THAN (25), -- Age <25
SUBPARTITION p_adult VALUES LESS THAN (40), -- 25 ≤ age <40
SUBPARTITION p_middle_aged VALUES LESS THAN (60),-- 40 ≤ age <60
SUBPARTITION p_senior VALUES LESS THAN (MAXVALUE) -- Age 60 and above
);
Optimize queries in HTAP scenarios
OceanBase Database supports HTAP (hybrid transactional and analytical processing) scenarios and can meet the requirements of both transactional (TP) and analytical (AP) applications. OceanBase Database supports rowstore, columnstore, and hybrid rowstore-columnstore storage formats for data tables. It also supports specifying the storage format for index tables. You can combine the advantages of rowstore and columnstore to process corresponding scenarios.
For more information about how to create a columnstore index, see Create a columnstore index in MySQL mode and Create a columnstore index in Oracle mode.
Rowstore base table + columnstore index
Scenarios: The table is a large wide table, and the business is primarily transactional with some analytical queries. Optimization strategy: Create a columnstore index to improve the performance of analytical queries.
- Advantages: Only part of the data is redundantly stored in columnstore format.
- Disadvantages: You need to identify suitable fields based on the query statements to create a columnstore index. Also, the index table data must be maintained during data writing, which may affect the write performance.
Columnstore base table + rowstore index
Scenarios: The business is primarily analytical, but it requires efficient simple queries (such as point queries). Optimization strategy: Create a covering index to avoid table scans.
- Advantages: Only part of the data is redundantly stored. Currently, you cannot convert a columnstore table to a rowstore table or a hybrid rowstore-columnstore table online. You can add a rowstore index to optimize queries.
- Disadvantages: You need to identify suitable fields based on the query statements to create a rowstore index. Also, the index table data must be maintained during data writing, which may affect the write performance.
Rowstore-columnstore redundancy: rowstore base table + columnstore base table
Scenarios: Both transactional and analytical queries are required, and the query methods are diverse. Optimization strategy: Redundantly store two base tables, and let the system generate an appropriate execution plan based on the query characteristics. By default, range scans are performed in columnstore mode, and point queries are performed in rowstore mode.
- Advantages: You do not need to analyze the query characteristics of analytical queries. The results of analytical queries are strongly consistent.
- Disadvantages: More disk space is occupied, and the execution plans may be inaccurate.
Columnstore replica: 2F1A1C
Scenarios: The business is typical HTAP, and you do not want analytical queries to affect transactional queries. You can accept weakly consistent reads for analytical queries. Optimization strategy: You can use the isolation feature of columnstore replicas to execute queries for AP and TP applications without affecting each other.
- Advantages:
- You do not need to store redundant data in multiple replicas, which saves disk space.
- AP and TP applications are isolated.
- The data write performance is not affected.
- Disadvantages:
- Columnstore replicas support weakly consistent reads.
- You need to provide an independent access entry.
