In OceanBase Database, tables are the most fundamental data storage units. Tables contain all the data accessible to users, with each table consisting of multiple rows, and each row containing multiple columns. The design and use of each table should be reasonably planned according to business requirements to ensure system efficiency and scalability.
OceanBase Database supports several types of tables, including partitioned tables, replicated tables, primary key tables and non-primary key tables, and external tables.
- Partitioned tables: OceanBase Database can divide the data of a regular table into different blocks according to certain rules, storing data from the same block together physically. Such tables are called partitioned tables. The basic partitioning strategies in OceanBase Database include range partitioning, list partitioning, and hash partitioning.
- Replicated tables: Replicated tables are a special type of table in OceanBase Database. These tables allow the latest modifications to be read from any "healthy" replica.
- Primary key tables and non-primary key tables: A primary key table is a table that contains a primary key; a table without a specified primary key is called a non-primary key table.
- External tables: In the database, table data is stored in the database’s storage space, while the data for external tables is stored in external storage services.
Table types
In AP scenarios, OceanBase supports a variety of table types. In addition to the commonly used replicated tables, partitioned tables, primary key tables, and non-primary key tables in TP scenarios, new types of tables are introduced based on data storage methods—whether stored by row or by column: columnstore tables and hybrid row-column tables.
Columnstore tables: Columnstore tables store data by columns instead of by rows, which can significantly improve the performance of analytical queries. They are especially suitable for scenarios with large volumes of data and frequent aggregation analysis. For more details, see: Columnstore table architecture.
Hybrid row-column tables: Both row-stored and column-stored data are maintained, and the system automatically determines whether row storage or column storage will deliver better query performance based on the query statement. This is suitable for scenarios that need to handle both transactional and analytical workloads.
Table update modes
OceanBase Database allows you to specify the data write and query mode when creating a table. You can use the merge_engine parameter in the CREATE TABLE statement to set either the delete_insert update mode or the partial_update update mode, which are data update strategies designed for different business scenarios.
delete_insert (full-column update mode)
Prioritizes query performance by converting
UPDATEoperations into full-columnDELETEandINSERTrecords using a "merge-on-write" mechanism, ensuring each row contains complete column values. This mode significantly improves the efficiency of complex queries and batch processing (such as analytical tasks), but requires additional storage space for incremental data. It is ideal for scenarios with frequent incremental data and a need for rapid analysis.partial_update (partial update mode)
Records only the values of modified columns, avoiding redundant storage. Queries require merging multiple data sets to obtain the latest values, resulting in lower performance, but making it suitable for scenarios with high-frequency updates and low query requirements (such as OLTP workloads) or environments where storage costs are a concern.
| Feature category | delete_insert update mode | partial_update update mode |
|---|---|---|
| Storage method | Each update writes two rows (DELETE and INSERT) to the SSTable, containing full-column data. |
Each update records only the values of modified columns, saving storage space. |
| Query efficiency |
|
Queries require merging records from multiple memtables/SSTables to obtain the latest value for the primary key, which may affect performance. Suitable for scenarios with frequent updates and sensitivity to storage costs. |
| Applicable scenarios | Scenarios with a high proportion of incremental data and frequent complex queries or batch analysis. | Scenarios with high-frequency updates and low query requirements. |
For more information, see Create tables in MySQL-compatible mode and Create tables in Oracle-compatible mode.
Data distribution
OceanBase Database distributes data across different partitions by creating partitioned tables. Data in different partitions can be distributed across different machines. When querying, partition pruning can reduce the amount of data scanned and leverage resources from multiple machines to improve query performance. By default, data from different tables is distributed randomly and has no direct relationship. Through load balancing, the data of a table can be evenly distributed across the entire cluster.
In distributed AP systems, tables usually contain large amounts of data. When data from different tables is randomly distributed, the overhead of data transfer during table joins can be significant. By using table groups, partitioned tables with the same partitioning method can have their data aligned according to specific rules, allowing related data to be gathered on the same machine. This enables partition wise join execution for these tables, effectively reducing the overhead of data transfer during joins and improving performance.
OceanBase supports both partitioned and subpartitioned tables, and supports three types of partitions: RANGE, LIST, and HASH.
OceanBase provides three types of table group attributes:
NONE: All partitions of all tables in the table group are gathered on the same machine.
PARTITION: The data of each table in the table group is distributed by partition. For subpartitioned tables, all subpartitions under the same partition are gathered together.
ADAPTIVE: The data of each table in the table group is distributed adaptively. If the tables in the table group are partitioned tables, they are distributed by partition; if the tables are subpartitioned tables, the subpartitions under each partition are distributed.
For more information, see Data distribution, Table groups in MySQL-compatible mode, and Table groups in Oracle-compatible mode.
Index types
Indexes are key components for improving query performance. In AP scenarios, OceanBase provides users with flexible and efficient data retrieval by supporting various index types. Here are some common index types and their application scenarios:
- Local index: A local index is built on a single partition and is suitable for scenarios requiring localized data queries. When data is strictly divided into multiple partitions, local indexes can significantly reduce the data scan range during queries, thereby improving query efficiency.
- Unique index: A unique index ensures that each value in the indexed column is unique. By creating unique indexes on certain columns, the database can effectively prevent duplicate data and accelerate the query process.
- Non-unique index: Unlike unique indexes, non-unique indexes do not require all values in the indexed column to be unique. They are mainly used to speed up data retrieval, especially when searching for multiple records that meet specific conditions, significantly improving search efficiency.
- Full-text index: OceanBase supports MySQL-compatible full-text search capabilities. By pre-processing text content and creating keyword indexes, full-text search efficiency can be greatly improved. For more details on full-text indexes, see Full-text indexes.
- JSON multi-valued index: In MySQL-compatible mode, OceanBase supports multi-valued indexes, which are especially useful for JSON documents and other collection data types. This feature allows you to create indexes on arrays or collections, improving the efficiency of queries based on JSON array elements. For more details on multi-value indexes, see Multi-valued indexes.
- Spatial index: Spatial indexes are used to optimize the query efficiency of geospatial data and are widely used in Geographic Information Systems (GIS) and location-based services. In these applications, spatial indexes enable fast retrieval of data within geographic coordinate ranges, providing strong support for location services.
- Columnstore index: In hybrid transactional and analytical processing (HTAP) scenarios, starting from V4.3.0, OceanBase supports specifying the storage format of a table as columnstore at creation. Since indexes and data tables are both tables in OceanBase, index tables can also be stored in columnstore format. Columnstore indexes store data by column rather than by row, which can significantly improve analytical query performance. Especially when processing large-scale data, they can greatly shorten data retrieval and analysis time, enabling real-time analytics. For more details, see Columnstore.
For more information about indexes, see Introduction to indexes in MySQL-compatible mode and Introduction to indexes in Oracle-compatible mode.
Data types
Before creating and using tables, database administrators need to plan table structures and data types according to business requirements. To ensure efficient data storage and optimized queries, administrators should follow these principles:
- Normalize table structure: Design table structures reasonably to minimize data redundancy and improve query efficiency.
- Choose appropriate SQL data types: Select the most suitable SQL data type for each column to reduce storage space and increase query speed.
Common SQL data types include:
- Basic data types: Such as
INT,VARCHAR,DATE, etc. - Complex data types: Such as
JSON,ARRAY,BITMAP, etc., which are suitable for storing more complex data structures.
For detailed descriptions of SQL data types, see:
Views
OceanBase database supports both standard views and materialized views.
- Standard views: Standard views, also known as non-materialized views, are the most common type of view. They only store the SQL query that defines the view, but do not store the query results.
- Materialized views: Materialized views are different from standard views in that they physically store the results of a query. OceanBase database supports asynchronous materialized views, meaning that when the underlying table data changes, the materialized view is not updated immediately, which ensures the performance of DML operations on the base table. See Materialized views in MySQL-compatible mode and Materialized views in Oracle-compatible mode.
Example of creating a data table
Creating a table with partitioning, columnstore, and rowstore indexes.
CREATE TABLE salesdata (
sale_id INT,
product_id INT NOT NULL,
saledate DATE NOT NULL,
saledate_int INT, -- defined as a regular column
quantity INT,
price DECIMAL(10, 2),
customer_id INT,
PRIMARY KEY (sale_id, saledate_int) -- the primary key can include regular columns
)
PARTITION BY RANGE COLUMNS (saledate_int) (
PARTITION p2023_q1 VALUES LESS THAN (202304),
PARTITION p2023_q2 VALUES LESS THAN (202307),
PARTITION p2023_q3 VALUES LESS THAN (202310),
PARTITION p2023_q4 VALUES LESS THAN (202401)
)
WITH COLUMN GROUP(each column);
CREATE INDEX idx_product_id ON salesdata(product_id);
CREATE INDEX idx_customer_id ON salesdata(customer_id);
Partitioning: The
salesdatatable uses RANGE partitioning.PARTITION BY RANGE COLUMNS (saledate_int): The table is partitioned based on thesaledate_intcolumn.Four partitions are defined:
- The
p2023_q1partition contains all data before202304(that is, the first quarter of 2023). - Partitions
p2023_q2,p2023_q3, andp2023_q4cover the data for the second, third, and fourth quarters of 2023, respectively.
- The
Columnstore: The clause
WITH COLUMN GROUP(each column);specifies columnar storage. Data for these columns will be stored in a columnar format, which is suitable for large-scale data analytics scenarios.Rowstore indexes: Two indexes,
idx_product_idandidx_customer_id, are created on the columnsproduct_idandcustomer_id, respectively. Rowstore indexes can accelerate queries based on specific columns, especially for high-frequency, small-scale queries.
When creating and using columnstore tables, if you import a large amount of data, you need to pay attention to performing a major compaction operation to improve read performance, as well as statistics collection to optimize execution strategies.
Major compaction operation: After bulk data import, it is recommended to perform a major compaction operation. This helps improve read performance because major compaction organizes fragmented data, making it more contiguous in physical storage and thus reducing disk I/O during reads. After importing data, trigger a major compaction operation within the tenant to ensure that all data is merged into the baseline layer. For details on how to do this, refer to MAJOR and MINOR (MySQL-compatible mode) and MAJOR and MINOR (Oracle-compatible mode).
Statistics collection: After the major compaction operation is complete, it is recommended to collect statistics. This is very important for the optimizer to generate effective query plans and execution strategies. Use GATHER_SCHEMA_STATS (MySQL-compatible mode) or GATHER_SCHEMA_STATS (Oracle-compatible mode) to collect statistics for all tables, and you can monitor the collection progress through the views GV$OB_OPT_STAT_GATHER_MONITOR (MySQL-compatible mode) and GV$OB_OPT_STAT_GATHER_MONITOR (Oracle-compatible mode).
Please note that as the amount of data in the column store table increases, the speed of the major compaction operation may decrease.