A columnstore replica (C replica) is a type of read-only replica in OceanBase Database. It is primarily used to handle analytical (AP) read-only queries. Data in a C replica is stored in columnar storage mode and is used together with full-featured replicas (F replicas) and read-only replicas (R replicas). Transactional writes and regular strong reads are still performed on F or R replicas. AP scans can leverage the more suitable storage format on C replicas. In a C replica, all user tables (including replicated tables, but excluding index tables, internal tables, and system tables) are stored in columnar storage format. OLAP workloads access C replicas through a dedicated proxy endpoint and perform decision analysis tasks using weak-consistency reads.
A columnstore replica has the following characteristics:
Contains full logs, a MemTable, and an SSTable.
Baseline data (major SSTable): For user tables on the same log stream, baseline data in a C replica is organized in columnar storage mode, which facilitates large-scale scanning and AP computing.
Incremental data and data fusion: Incremental data such as the MemTable and minor compactions are still organized in row-based storage mode. The C replica asynchronously synchronizes logs (clogs) from the source replica and replays them locally.
Cannot join a Paxos group or participate in log voting as a member of the Paxos group. Instead, it works as a listener that synchronizes logs from the Paxos group members and then locally replays the logs. It does not act as a voting member and therefore causes no latency in transaction commits.
Provides read-only services if an application does not require strong-consistency reads.
Stores user tables in the same log stream in columnar storage mode at the major SSTable layer.
Cannot be converted into a replica of another type.
The following table describes more features of such a replica.
Feature |
Description |
|---|---|
| Replica name and abbreviation | COLUMNSTORE (C) |
| Whether logs are contained | It has asynchronous logs. It is only a listener instead of a member of the Paxos group (ASYNC_CLOG). |
| Whether a MemTable is contained | Yes (WITH_MEMSTORE) |
| Whether an SSTable is contained | Yes (WITH_SSSTORE) |
| Data security | Medium |
| Time to become the leader | Not supported |
| Resource cost | High |
| Service | Supports non-consistent read. |
| Limitations on replica type conversion | Cannot be converted into a replica of another type |
Scenarios where a table is temporarily stored as rowstore on a C replica
In OceanBase Database, if a table is created as a rowstore table, the system creates a corresponding pure columnstore table on the C replica. If a table is created as a columnstore table, its storage method on the C replica remains consistent with that on the F replica. Therefore, for a C replica, it only converts the user partitions of rowstore user tables from the F replica to columnstore storage. This description indicates only the "final" state as columnstore. On a C replica, partitions are not always in columnstore format. In the following scenarios, user table partitions on a C replica are temporarily stored as rowstore, and the system needs to spontaneously schedule row-to-column conversion tasks to convert the latest baseline data to columnstore.
Scenario |
Description |
|---|---|
| Replica replenishment (adding replicas) | Taking modifying locality so that F@z1, F@z2 changes to F@z1, F@z2, C@z3 as an example:
|
| Log stream rebuild | When rebuilding the log stream on a C replica, the system pulls the corresponding baseline from the source to the C replica. If the baseline data is in rowstore format, it remains rowstore temporarily until a background row-to-column conversion task is scheduled. |
| Concurrent replica replenishment and offline DDL | When a C replica is in the log stream member list, if an offline DDL statement is executed, the system directly builds a columnstore baseline on the C replica. However, when replica replenishment and offline DDL are concurrent, the C replica is invisible to the log stream leader of the DDL task. In this case, the system first builds a rowstore baseline on the C replica and then converts it to columnstore after a background row-to-column conversion task is scheduled. |
| Full direct load | Full direct load currently only supports importing rowstore data to a C replica first, which is then converted to columnstore after a background row-to-column conversion task is scheduled. |
| Table-level restore | Since columnstore tables do not currently support table-level restore, table-level restore on a C replica also only supports restoring to rowstore first, and then converting to columnstore after a background row-to-column conversion task is scheduled. |
In the above scenarios, during the process of converting rowstore to columnstore, although the optimizer generates a columnstore query plan, the actual execution still queries the rowstore baseline. Users can query the available information and the progress of row-to-column conversion on a C replica through the CDB_OB_CS_REPLICA_STATS view (system tenant) and the DBA_OB_CS_REPLICA_STATS view (user tenant). After the row-to-column conversion task is completely finished, you can perform queries on the C replica.
In the sys tenant, the following example shows how to query the progress of row-to-column conversion for tablets in the log streams of all tenant C replicas:
obclient[oceanbase]> SELECT * FROM oceanbase.CDB_OB_CS_REPLICA_STATS;
The query result is as follows:
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
| TENANT_ID | SVR_IP | SVR_PORT | LS_ID | TOTAL_TABLET_CNT | AVAILABLE_TABLET_CNT | TOTAL_MACRO_BLOCK_CNT | AVAILABLE_MACRO_BLOCK_CNT | AVAILABLE |
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
| 1004 | xx.xxx.xxx.212 | 63000 | 1001 | 1019 | 1019 | 10706 | 10706 | TRUE |
| 1006 | xx.xxx.xxx.212 | 63000 | 1001 | 133 | 133 | 875 | 875 | TRUE |
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
2 rows in set
From the first line of the query result, we can see that for the tenant with ID 1004, the server where its C replica resides is xx.xxx.xxx.212, with port number 63000; the ID of the C replica log stream is 1001; the total number of partitions currently needing conversion to columnstore is 1019, and the total number of available partitions is 1019; the current total number of baseline macroblocks is 10706, and the number of available baseline macroblocks is 10706. You can roughly estimate the progress of row-to-column conversion using AVAILABLE_TABLET_CNT / TOTAL_TABLET_CNT or AVAILABLE_MACRO_BLOCK_CNT / TOTAL_MACRO_BLOCK_CNT. A log stream is fully available only when all tablets in it are available.
