Columnstore replica (C replica) is a read-only replica type in OceanBase Database, primarily used to handle analytical (AP) read-only queries. Data on a C replica is stored in columnar format and is used in conjunction with full-featured replicas (F replicas) and read-only replicas (R replicas). Transactional writes and regular strong reads still primarily occur on F or R replicas; AP scans can benefit from the more suitable storage format on C replicas. On a C replica, all user tables (including replicated tables, but excluding index tables, internal tables, and system tables) are stored in columnar format. OLAP workloads access C replicas through a dedicated proxy endpoint and perform decision analysis tasks using weak-consistency reads.
Columnstore replicas have the following main characteristics:
Includes complete logs, MemTables, and SSTables.
Major SSTable: User tables on the same log stream are organized in columnar format on a C replica, facilitating large-scale scanning and AP computation.
Incremental and fusion: Incremental data such as MemTables and minor compactions is still organized in row format. The C replica asynchronously catches up with the source replica via clogs and replays them locally.
Cannot form a Paxos member group. It does not participate in log voting as a Paxos member but acts as an observer, continuously catching up with the logs of Paxos members and replaying them locally, thus avoiding increased transaction commit latency due to more voting members.
Can provide read-only services when the business does not require high consistency for data reads.
On the same log stream, user tables are stored in columnar format at the major SSTable level.
Cannot be converted to other replica types.
More features and their descriptions are shown in the following table.
Feature |
Description |
|---|---|
| Replica Name and Abbreviation | COLUMNSTORE(C) |
| Whether there is LOG | Yes, they are asynchronous logs, but they do not belong to the Paxos group; they are just listeners (ASYNC_CLOG). |
| Whether there is a MemTable | Yes (WITH_MEMSTORE) |
| Whether there is an SSTable | Yes (WITH_SSSTORE) |
| Data Security | Medium |
| Time to Restore as Leader | Not supported |
| Resource Cost | High |
| service | Non-consistent read allowed |
| Limitations on replica type conversion | Cannot be converted to other replica types |
Scenarios where the storage format on a C replica is temporarily row-based
In OceanBase Database, if a table is created as a row-based table, the system creates a corresponding pure columnstore table on the C replica. If a table is created as a columnstore table, its storage format on the C replica remains consistent with that on the F replica. Therefore, for a C replica, it only converts the user partitions of row-based user tables from the F replica into columnar storage. This description refers only to the "final" state being columnar. On a C replica, partitions are not always in columnar format. In the following scenarios, user table partitions on the C replica are temporarily row-based, requiring the system to automatically schedule row-to-column conversion tasks to convert the latest baseline data into columnar format.
Scenario |
Description |
|---|---|
| Add Replicas | Modify the locality so thatF@z1, F@z2Changed toF@z1, F@z2, C@z3For example:
|
| Log Stream Rebuild | When a log stream rebuild is triggered on replica C, the system pulls the corresponding baseline from the source to replica C. If the baseline data is row-based, it remains row-based temporarily until a background row-to-column conversion task is scheduled and completed. |
| Concurrent execution of replica replenishment and offline DDL operations | When the C replica is in the log stream member list, if an offline DDL operation is performed, the system will directly build a columnar baseline on the C replica. However, when a minor compaction and an offline DDL operation occur concurrently, the C replica becomes invisible to the log stream leader of the DDL task. In this case, the system first builds a row-based baseline on the C replica and then converts it to a columnar baseline after a row-to-column conversion task is scheduled in the background. |
| Full Direct Load | Currently, full direct load only supports importing row data into a C replica first. After the background job converts the data from row format to column format, the data is converted into columnar storage. |
| Table-level restore | As table-level restore is not currently supported for columnar tables, restoring a table at the C replica level requires first converting it to a row-based format. After the backend schedules a row-to-column conversion task, the table can then be converted back to columnar format. |
In the above scenarios, during the conversion from row-based to columnar storage, although the optimizer generates a columnar query plan, the actual execution is still a query against the row-based baseline. Users can query the available information and the progress of the row-to-column conversion on a C replica using the views CDB_OB_CS_REPLICA_STATS (system tenant) and DBA_OB_CS_REPLICA_STATS (user tenant). After the row-to-column conversion task is completely finished, you can then perform queries on the C replica.
In the sys tenant, the following example shows how to query the progress of converting tablets in the log streams of all tenants' C replicas:
obclient[oceanbase]> SELECT * FROM oceanbase.CDB_OB_CS_REPLICA_STATS;
The query result is as follows:
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
| TENANT_ID | SVR_IP | SVR_PORT | LS_ID | TOTAL_TABLET_CNT | AVAILABLE_TABLET_CNT | TOTAL_MACRO_BLOCK_CNT | AVAILABLE_MACRO_BLOCK_CNT | AVAILABLE |
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
| 1004 | xx.xxx.xxx.212 | 63000 | 1001 | 1019 | 1019 | 10706 | 10706 | TRUE |
| 1006 | xx.xxx.xxx.212 | 63000 | 1001 | 133 | 133 | 875 | 875 | TRUE |
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
2 rows in set
From the first line of the query result, we can see that for the tenant with ID 1004, the server hosting its C replica is xx.xxx.xxx.212, with port number 63000. The ID of the C replica log stream is 1001. Currently, there are 1019 partitions that need to be converted to columnar format, and 1019 partitions are available. The total number of baseline macroblocks is 10706, and 10706 are available. You can roughly estimate the progress of the row-to-column conversion using AVAILABLE_TABLET_CNT / TOTAL_TABLET_CNT or AVAILABLE_MACRO_BLOCK_CNT / TOTAL_MACRO_BLOCK_CNT. A log stream is fully available only when all tablets in it are available.
