A columnstore replica (C replica) is a type of read-only replica in OceanBase Database. It is primarily used for analytical (AP) read-only queries. Data in a C replica is stored in columnar storage. It works with all-purpose replicas (F replicas) and read-only replicas (R replicas): TP writes and regular reads are mainly performed on F and R replicas, while AP scans are performed on C replicas. In a C replica, all user tables (including replicated tables, but not index tables, internal tables, or system tables) are stored in columnar storage. OLAP services access C replicas through a dedicated proxy and perform weak reads for decision analysis.
A columnstore replica has the following characteristics:
Contains full logs, a MemTable, and an SSTable.
Baseline (major SSTable): The baseline data of user tables in the same log stream is stored in columnar storage in a C replica. This facilitates large-scale scans and AP computations.
Incremental and hybrid storage: Incremental data such as MemTables and minor SSTables are still stored in row-based storage. A C replica asynchronously synchronizes logs from the source replica and replays them locally.
Cannot join a Paxos group or participate in log voting as a member of the Paxos group. Instead, it works as a listener that synchronizes logs from the Paxos group members and then locally replays the logs. It does not act as a voting member and therefore causes no latency in transaction commits.
Provides read-only services if an application does not require strong-consistency reads.
Stores user tables in the same log stream in columnar storage mode at the major SSTable layer.
Cannot be converted into a replica of another type.
The following table describes more features of such a replica.
| Feature | Description |
|---|---|
| Replica name and abbreviation | COLUMNSTORE (C) |
| Whether logs are contained | It has asynchronous logs. It is only a listener instead of a member of the Paxos group (ASYNC_CLOG). |
| Whether a MemTable is contained | Yes (WITH_MEMSTORE) |
| Whether an SSTable is contained | Yes (WITH_SSSTORE) |
| Data security | Medium |
| Time to become the leader | Not supported |
| Resource cost | High |
| Service | Supports non-consistent read. |
| Limitations on replica type conversion | Cannot be converted into a replica of another type |
Scenarios where a C replica stores data in rowstore format
In OceanBase Database, if a table is created as a rowstore table, the system creates a pure columnstore table on the C replica. If a table is created as a columnstore table, the storage method on the C replica is the same as that on the F replica. Therefore, the C replica only converts user partitions of rowstore tables on the F replica to columnstore format, and this description only indicates that the final state is columnstore. On the C replica, not all partitions are in columnstore format at any given time. In the following scenarios, user partitions on the C replica will temporarily be in rowstore format, and the system will automatically schedule row-to-column conversion tasks to convert the latest baseline data to columnstore format.
| Scenario | Description |
|---|---|
| Adding a replica | For example, when modifying the locality to change F@z1, F@z2 to F@z1, F@z2, C@z3:
|
| Rebuilding a log stream | When a log stream rebuild is triggered on the C replica, the system will pull the corresponding baseline data to the C replica. If the baseline data is in rowstore format, it will temporarily remain in rowstore format until the system schedules a row-to-column conversion task to convert it to columnstore format. |
| Concurrent replica addition and Offline DDL operations | When the C replica is part of the log stream membership, if an Offline DDL operation is executed, the system will directly build a columnstore baseline on the C replica. However, when a replica addition and an Offline DDL operation occur concurrently, the C replica will not be visible to the log stream leader executing the DDL task. In this case, the system will first build a rowstore baseline on the C replica and then convert it to columnstore format after scheduling a row-to-column conversion task. |
| Full direct load | For full direct loads, the system currently only supports importing data in rowstore format to the C replica first, and then converting it to columnstore format after scheduling a row-to-column conversion task. |
| Table-level restore | Since columnstore tables currently do not support table-level restores, the system will first restore the table to rowstore format on the C replica and then convert it to columnstore format after scheduling a row-to-column conversion task. |
In the scenarios described above, during the row-to-column conversion process, although the optimizer generates a columnstore query plan, the actual execution is still against the rowstore baseline. Users can query the CDB_OB_CS_REPLICA_STATS (system tenant) and DBA_OB_CS_REPLICA_STATS (user tenant) views to obtain information about the C replica and the progress of the row-to-column conversion. After the row-to-column conversion task is completed, users can then query the C replica.
In the sys tenant, an example query to retrieve the row-to-column conversion progress of all tenants' C replicas in their log streams is as follows:
obclient[oceanbase]> SELECT * FROM oceanbase.CDB_OB_CS_REPLICA_STATS;
The query results are as follows:
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
| TENANT_ID | SVR_IP | SVR_PORT | LS_ID | TOTAL_TABLET_CNT | AVAILABLE_TABLET_CNT | TOTAL_MACRO_BLOCK_CNT | AVAILABLE_MACRO_BLOCK_CNT | AVAILABLE |
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
| 1004 | xx.xxx.xxx.212 | 63000 | 1001 | 1019 | 1019 | 10706 | 10706 | TRUE |
| 1006 | xx.xxx.xxx.212 | 63000 | 1001 | 133 | 133 | 875 | 875 | TRUE |
+-----------+----------------+----------+-------+------------------+----------------------+-----------------------+---------------------------+-----------+
2 rows in set
From the first row of the query results, we can see that for tenant ID 1004, the C replica is located on the server xx.xxx.xxx.212 with port 63000; the log stream ID of the C replica is 1001; the total number of partitions to be converted to columnstore format is 1019, and the number of available partitions is 1019; the total number of baseline macroblocks is 10706, and the number of available macroblocks is 10706. Users can estimate the progress of the row-to-column conversion using AVAILABLE_TABLET_CNT / TOTAL_TABLET_CNT or AVAILABLE_MACRO_BLOCK_CNT / TOTAL_MACRO_BLOCK_CNT. A log stream is considered fully available only when all tablets in the log stream are available.
