You can use the CREATE TABLE statement to create a table.
This section mainly introduces non-partitioned table creation. For information about how to create and use partitioned tables, see Create a partitioned table.
Create a non-partitioned table
Creating a non-partitioned table means creating a table that has only one partition.
A sample statement for creating a non-partitioned table is as follows:
obclient> CREATE TABLE table_name1(w_id int,
w_ytd decimal(12,2),
w_tax decimal(4,4),
w_name varchar(10),
w_street_1 varchar(20),
w_street_2 varchar(20),
w_city varchar(20),
w_state char(2),
w_zip char(9),
unique(w_name, w_city),
primary key(w_id)
);
obclient> CREATE TABLE table_name2 (c_w_id int NOT NULL,
c_d_id int NOT null,
c_id int NOT null,
c_discount decimal(4, 4),
c_credit char(2),
c_last varchar(16),
c_first varchar(16),
c_middle char(2),
c_balance decimal(12, 2),
c_ytd_payment decimal(12, 2),
c_payment_cnt int,
c_credit_lim decimal(12, 2),
c_street_1 varchar(20),
c_street_2 varchar(20),
c_city varchar(20),
c_state char(2),
c_zip char(9),
c_phone char(16),
c_since date,
c_delivery_cnt int,
c_data varchar(500),
FOREIGN KEY (c_w_id) REFERENCES table_name1(w_id),
primary key (c_w_id, c_d_id, c_id)
);
The example creates two tables and defines some constraints on different columns, including primary keys and foreign keys. For more information about primary keys and foreign keys, see Define column constraints.
Select the correct data type for each column when you create a table. For more information about SQL data types, see Data types.
Note
To ensure performance and facilitate maintenance, we recommend that you specify a primary key or a unique key when you create a table. If no existing field can be used as the primary key, you can add a numeric column as the primary key, and use the Sequence object of the Oracle-compatible tenant to generate sequential values for this column. For more information about sequences, see Manage sequences.
Create a replicated table
A replicated table is a special type of table in OceanBase Database. A replicated table can read the latest data modifications from any healthy replica. If you do not frequently write data and are more concerned about the operation latency and load balancing, a replicated table is a good choice.
After you create a replicated table, a replica of the replicated table is created on each OBServer node in the tenant. One of these replicas is elected as the leader to receive write requests, and the other replicas serve as followers to receive only read requests.
All followers must report their status, including the replica replay progress (data synchronization progress), to the leader. Generally, the replica replay progress on a follower lags behind that on the leader. A follower is considered by the leader as healthy only if the data latency between the follower and the leader is within the specified threshold. A healthy follower can quickly synchronize data modifications from the leader. If the leader considers a follower as healthy within a period of time, it will grant a lease period to the follower. In other words, the leader believes that the follower can keep healthy and provide strong-consistency read services within the lease period. During this lease period, the leader confirms the replay progress on the follower before each replicated table transaction is committed. The leader returns the commit result of a transaction only after the follower successfully replays the modifications in the transaction. At this time, you can read the modifications in the committed transaction from the follower.
The replicated table feature is already supported in OceanBase Database V3.x. However, the database architecture is significantly modified in OceanBase Database V4.x. Therefore, to adapt to the new architecture of standalone log streams, OceanBase Database V4.x builds the partition-based readable version verification and log stream-based lease granting mechanisms to ensure the correctness in strong-consistency reads.
In addition, OceanBase Database V4.x improves the capability of switching the leader without terminating transactions. In OceanBase Database V4.x, replicated table transactions that are not committed when a leader switch is initiated by you or the load balancer can continue after the leader is switched, which is not supported in OceanBase Database V3.x. Compared with a replicated table of OceanBase Database V3.x, a replicated table of OceanBase Database V4.x has higher transaction write performance and more powerful disaster recovery capabilities. In addition, a replica crash has slighter impacts on read operations.
Limitations on replicated tables
Replicated tables:
Limitations on creating replicated tables: You cannot create a replicated table in the sys tenant or the meta tenant, which does not have broadcast log streams.
Impact on write performance: Since writing to a replicated table requires synchronization across all replicas, the larger the number of nodes, the greater the impact on write performance.
- To avoid this issue, try not to mix replicated table write and read operations in the same transaction. Instead, perform pure write or pure read operations on a replicated table.
Attribute conversion:
- Replicated tables and table groups are mutually exclusive. If you attempt to modify the table group of a replicated table, an error is returned. If a table belongs to a table group, an error is returned when you attempt to convert it to a replicated table.
- The attribute conversion of replicated tables depends on load balancing and transfer. Make sure that the related parameters are enabled.
Routing:
In a transaction that includes write operations on a replicated table, subsequent queries to the replicated table may encounter unreadable replicas if routed randomly to a follower. The observer will internally forward the request to the leader of the replicated table, affecting query performance.
- In ODP V4.3.3, the routing strategy has been adjusted. If a write operation is performed on a replicated table in a transaction, the query is routed to the leader of the replicated table.
If a
JOINquery is performed on a replicated table, the query is randomly routed to a replica of the replicated table (for aJOINquery, ODP routes the query based on the first table). In this case, the query may be routed to a non-leader node of the other table, which causes a remote plan.If the attributes of a table are changed to those of a replicated table, ODP cannot detect the change and cannot route the table to distribute the pressure.
If a replicated table is modified in a transaction, the query operations after the transaction: If you query a replicated table, the system generates an execution plan to select a local replica. However, because the replicated table is modified, the system returns an error indicating that the replica is not readable. The SQL statement is retried to query the leader replica. In this case, the plan cache cannot be hit, which affects the query performance.
Note
Within a transaction, if changes are made to a replicated table, the system always prioritizes selecting the leader replica of the replicated table for queries rather than directly selecting a local replica. This ensures improved query efficiency in OceanBase Database and avoids incorrect replica selection.
Broadcast log streams:
- Each user tenant can have only one broadcast log stream.
- You cannot convert the attributes of a broadcast log stream to those of a normal log stream.
- You cannot manually delete a broadcast log stream. It will be automatically deleted when the tenant is deleted.
Create a replicated table
To create a replicated table, you must add the DUPLICATE_SCOPE option to the CREATE TABLE statement. You can create a replicated table only in a user tenant. You cannot create a replicated table in the sys tenant. The SQL syntax for creating a replicated table is as follows:
CREATE TABLE table_name column_definition DUPLICATE_SCOPE='none | cluster';
In the syntax, the DUPLICATE_SCOPE option specifies the attribute of the replicated table. Valid values are the following ones:
none: The table is a normal table.cluster: The table is a replicated table. The leader must copy transactions to all full-featured (F) and read-only (R) replicas in the current tenant.
If you do not specify DUPLICATE_SCOPE when you create a table, the default value none takes effect.
CREATE TABLE dup_t1 (c1 NUMBER,c2 NUMBER) DUPLICATE_SCOPE= 'cluster';
When the first replicated table is created for a tenant, the system automatically creates a broadcast log stream for the tenant. Then, subsequent replicated tables of the tenant will all be created in the broadcast log stream. A broadcast log stream differs from a normal log stream in that the broadcast log stream will automatically deploy a replica on each OBServer node of the tenant to ensure that the replicated table can provide strong-consistency reads on any OBServer node in ideal conditions. You can execute the following SQL statement to view the broadcast log stream where the replicated tables of a tenant reside:
SELECT * FROM SYS.DBA_OB_LS WHERE flag LIKE "%DUPLICATE%";
A sample query result is as follows.
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
| LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | UNIT_LIST |
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
| 1003 | NORMAL | z1;z2 | 0 | 0 | 1684982852976428261 | NULL | 1684983282912048623 | 1684983282912048623 | DUPLICATE | |
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
1 rows in set
In this example, the log stream with the ID of 1003 is a broadcast log stream. All replicated tables of the current tenant are created in this log stream. For more information about broadcast log streams, see About replicas.
After a replicated table is created, you can perform insert and read/write operations on the replicated table as on a normal table. For read requests, if you connect to the database by using a proxy, read requests may be routed to any OBServer node. If you connect to the database directly, read requests will be executed on the OBServer node that you directly connect to as long as the local replica is readable. For more information about database connection methods, see Overview of connection methods.
Create a table by copying the data in an existing table
You can use the CREATE TABLE AS SELECT statement to copy the basic data types and data of a table, but not constraints, indexes, or NOT NULL attributes.
A sample statement is as follows:
CREATE TABLE t2_copy AS SELECT * FROM t2;
You cannot use the CREATE TABLE LIKE statement to copy table schemas.
Create a rowstore table
OceanBase Database allows you to create rowstore tables and convert rowstore tables into columnstore tables.
When the parameter default_table_store_format='row' (default value) is set, a rowstore table is created by default. When default_table_store_format is not set to row, you can specify the WITH COLUMN GROUP(all columns) option to create a rowstore table.
For information about how to convert a rowstore table into a columnstore table, see Modify a table. For information about how to create a columnstore index, see Create an index.
You can specify the WITH COLUMN GROUP(all columns) option to create a rowstore table.
Example:
CREATE TABLE tbl1_cg (col1 INT PRIMARY KEY, col2 VARCHAR(50)) WITH COLUMN GROUP(all columns);
Note
If you create a rowstore table by specifying the WITH COLUMN GROUP(all columns) option, the table remains in the rowstore format even after you execute the DROP COLUMN GROUP(all columns) command to drop this column group.
Create a columnstore table
OceanBase Database allows you to create a columnstore table, convert a rowstore table into a columnstore table, and create a columnstore index. You can use the WITH COLUMN GROUP option to explicitly specify to create a columnstore table or a hybrid row-column table. You can also set the default_table_store_format parameter to specify columnar storage or hybrid row-column storage as the default storage format.
For information about how to convert a rowstore table into a columnstore table, see Modify a table. For information about how to create a columnstore index, see Create an index.
You can specify the WITH COLUMN GROUP(all columns, each column) option to create a hybrid row-column table.
Example:
CREATE TABLE tbl1_cg (col1 NUMBER PRIMARY KEY, col2 VARCHAR2(50)) WITH COLUMN GROUP(all columns, each column);
You can specify the WITH COLUMN GROUP(each column) option to create a columnstore table.
Example:
CREATE TABLE tbl2_cg (col1 NUMBER PRIMARY KEY, col2 VARCHAR2(50)) WITH COLUMN GROUP(each column);
If you import a large amount of data to a columnstore table, you need to execute a major compaction to improve read performance and perform statistics collection to adjust the execution strategy.
Major compaction: After importing data in batches, we recommend that you perform a major compaction. This helps improve read performance, because a major compaction organizes fragmented data and makes it more contiguous on physical storage, thus reducing disk I/O for reads. After importing data, trigger a major compaction in the tenant to ensure that all data is compacted to the baseline layer. For more information, see
MAJOR and MINOR.Statistics collection: After the major compaction is done, we recommend that you perform statistics collection. This is crucial for the optimizer to generate effective query plans and execution strategies. Call GATHER_SCHEMA_STATS to collect statistics on all tables, and monitor the collection progress by querying the GV$OB_OPT_STAT_GATHER_MONITOR view.
Note that as the data amount of a columnstore table increases, the major compaction may take longer.
Specify the update model of a table
The syntax for specifying the update model of a table during table creation is to add the MERGE_ENGINE table option after the CREATE TABLE statement. The SQL statement is as follows:
CREATE TABLE table_name column_definition
MERGE_ENGINE = {delete_insert | partial_update}
WITH COLUMN GROUP([all columns,] each column);
Note
After you specify the MERGE_ENGINE parameter when you create a table, the configuration value of the parameter cannot be modified.
The MERGE_ENGINE parameter specifies the update model of the table. Valid values:
partial_update: the default value, indicating that the existing update mode is retained.delete_insert: indicates that theupdateoperation is converted into a combination ofdeleteandinsertoperations, and all columns are written.
If you do not specify the MERGE_ENGINE option, the value of this option is the same as the value of the default_table_merge_engine parameter.
Scenarios
We recommend that you use delete_insert for OLAP scenarios and partial_update for all other scenarios. For more information about the configuration scenarios, see Configuration best practices.
Example:
Create a pure columnstore table named
mer_tbl1with themerge on writeupdate model.CREATE TABLE mer_tbl1 (col1 NUMBER, col2 NUMBER) MERGE_ENGINE = delete_insert WITH COLUMN GROUP(each column);Create a hybrid row-column table named
mer_tbl2with themerge on writeupdate model.CREATE TABLE mer_tbl2 (col1 NUMBER, col2 NUMBER) merge_engine = delete_insert WITH COLUMN GROUP(all columns, each column);