You can use the CREATE TABLE statement to create a table.
This topic describes how to create a nonpartitioned table. For information about how to create and use a partitioned table, see Create a partitioned table.
Create a non-partitioned table
Creating a non-partitioned table means creating a table with only one partition.
Here is an example of creating a non-partitioned table:
obclient>CREATE TABLE table_name1(w_id int
, w_ytd decimal(12,2)
, w_tax decimal(4,4)
, w_name varchar(10)
, w_street_1 varchar(20)
, w_street_2 varchar(20)
, w_city varchar(20)
, w_state char(2)
, w_zip char(9)
, unique(w_name, w_city)
, primary key(w_id)
);
Query OK, 0 rows affected (0.09 sec)
obclient>CREATE TABLE table_name2 (c_w_id int NOT NULL
, c_d_id int NOT null
, c_id int NOT null
, c_discount decimal(4, 4)
, c_credit char(2)
, c_last varchar(16)
, c_first varchar(16)
, c_middle char(2)
, c_balance decimal(12, 2)
, c_ytd_payment decimal(12, 2)
, c_payment_cnt int
, c_credit_lim decimal(12, 2)
, c_street_1 varchar(20)
, c_street_2 varchar(20)
, c_city varchar(20)
, c_state char(2)
, c_zip char(9)
, c_phone char(16)
, c_since date
, c_delivery_cnt int
, c_data varchar(500)
, index icust(c_last, c_d_id, c_w_id, c_first, c_id)
, FOREIGN KEY (c_w_id) REFERENCES table_name1(w_id)
, primary key (c_w_id, c_d_id, c_id)
);
Query OK, 0 rows affected
In the example, two tables are created, and some constraints are defined for the columns in the tables, including primary keys and foreign keys on different columns. For more information about primary keys, foreign keys, and other constraints, see Define the constraint type for a column.
When creating columns for a table, choose the appropriate data type. For more information about SQL data types, see Data types.
Note
To ensure performance and facilitate future maintenance, we recommend that you specify a primary key or a unique key when you create a table. If no suitable field can be used as the primary key, you can omit the PRIMARY KEY clause when you create the table. After the table is created, the system automatically specifies an auto-increment column as the hidden primary key. For more information about auto-increment columns, see Define an auto-increment column.
Create a replicated table
A replicated table is a special type of table in OceanBase Database. This table can read the latest modifications of data from any "healthy" replica. For users who have lower write requirements and higher requirements for read latency and load balancing, a replicated table is a good choice.
After a user creates a replicated table, a replica of the table will be created on every OBServer node in the tenant. One of these replicas will be selected as the leader to accept write requests, while the remaining replicas can only accept read requests.
All replicas need to report their status to the leader, primarily their replay progress, which indicates the synchronization progress of data. Generally, the replay progress of a follower replica will lag slightly behind that of the leader. As long as the lag does not exceed a certain threshold, the leader considers the replica to be "healthy" and can quickly replay the modifications from the leader. Once the leader determines that a replica is "healthy" for a certain period, it grants the follower a lease. In simple terms, the leader trusts the follower to remain "healthy" during this period and to provide strongly consistent read services. During this "trust" period, the leader will check the replay progress of the follower before committing each transaction in the replicated table. After the follower replays the modifications of the transaction, the leader will inform the user that the transaction has been successfully committed. At this point, the user can read the modifications of the just-committed transaction from the follower.
The feature of replicated tables has been available in OceanBase Database V3.x. However, in OceanBase Database V4.x, due to significant changes in the architecture, the replicated tables in V4.x have been adapted to the new architecture of single-machine log streams. They have implemented a partition-based readable version number verification and a log stream-based lease granting mechanism to ensure the correctness of strongly consistent reads.
Additionally, the feature of replicated tables in OceanBase Database V4.x has improved the ability to switch leaders without killing transactions. In V3.x, when a leader switch is initiated by the user or the load balancer, uncommitted replicated table transactions would fail. However, in V4.x, these transactions can continue to execute after the leader switch. Compared to V3.x, V4.x's replicated tables also offer better write transaction performance and stronger disaster recovery capabilities. The impact of a replica failure on read operations is significantly reduced.
Limitations of replicated tables
Replicated tables:
Creation limitations: The sys tenant and meta tenant do not support creating replicated tables because they do not have broadcast log streams.
Write performance is affected by the number of nodes: Since replicated table writes need to be synchronized to all replicas, the more nodes there are, the greater the impact on write performance.
- Workaround: Try to avoid having both write and read operations on the same transaction for replicated tables. Pure write or pure read transactions on replicated tables are fine.
Attribute conversion:
- Replicated tables and table groups are mutually exclusive. Modifying the table group attribute of a replicated table will result in an error. When converting a regular table to a replicated table, if the regular table belongs to a table group, the attribute change command will also result in an error.
- Converting a replicated table depends on load balancing and transfer, so make sure the relevant configuration items are enabled.
Routing:
If a transaction involves a write operation on a replicated table, subsequent queries on the replicated table may be randomly routed to a follower replica, which is unreadable. The observer will internally forward the query to the leader of the replicated table, which can affect query performance.
- In ODP V4.3.3, the routing strategy has been adjusted. If there is a write operation on a replicated table, subsequent queries will be routed to the leader of the replicated table.
Queries that join a replicated table with a regular table will be randomly routed based on the replicated table (for join queries, ODP routes based on the first table). This may route to a non-leader node of the regular table, leading to a remote plan.
When a regular table is converted to a replicated table, ODP cannot detect this change and cannot route queries to the replicated table to balance the load.
Within a transaction, if a replicated table has been modified, subsequent query operations will generate an execution plan that selects the local replica. However, due to the modification, the system will report an error indicating that the replica is unreadable. The SQL retry will select the leader replica, resulting in a missed plan cache hit and poor query performance.
Note
Within a transaction, if a replicated table has been modified, OceanBase Database will always prioritize selecting the leader replica of the replicated table for queries, rather than directly selecting the local replica. This ensures that OceanBase Database improves query efficiency and avoids errors in replica selection.
Broadcast log streams:
- Each user tenant can have at most one broadcast log stream.
- Attribute conversion between broadcast log streams and regular log streams is not supported.
- Manual deletion of broadcast log streams is not supported. They are deleted along with the tenant.
Syntax for creating a replicated table
To create a replicated table, add the DUPLICATE_SCOPE option to the CREATE TABLE statement. Only user tenants can create replicated tables, and the sys tenant cannot create replicated tables. The SQL statement for creating a replicated table is as follows:
CREATE TABLE table_name column_definition DUPLICATE_SCOPE='none | cluster';
The DUPLICATE_SCOPE parameter specifies the attribute of the replicated table. Valid values are as follows:
none: Indicates that the table is a regular table.cluster: Indicates that the table is a replicated table. The leader needs to replicate transactions to all F and R replicas of the current tenant.
If you do not specify the DUPLICATE_SCOPE parameter when creating a table, the default value is none.
CREATE TABLE dup_t1 (c1 int,c2 int) DUPLICATE_SCOPE= 'cluster';
When the first replicated table is created in a tenant, the system will also create a special log stream called a broadcast log stream. Subsequent replicated tables will be created on this broadcast log stream. The difference between a broadcast log stream and a regular log stream is that a broadcast log stream automatically deploys a replica on every OBServer node within the tenant. This ensures that, under ideal conditions, a replicated table can provide strongly consistent reads from any OBServer node. You can use the following SQL statement to view the broadcast log stream where the replicated tables of a tenant are located:
SELECT * FROM oceanbase.DBA_OB_LS WHERE flag LIKE "%DUPLICATE%";
Here is an example of the query result.
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
| LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | UNIT_LIST |
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
| 1003 | NORMAL | z1;z2 | 0 | 0 | 1683267390195713284 | NULL | 1683337744205408139 | 1683337744205408139 | DUPLICATE | |
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
| LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | UNIT_LIST |
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
| 1003 | NORMAL | z1;z2 | 0 | 0 | 1683267390195713284 | NULL | 1683337744205408139 | 1683337744205408139 | DUPLICATE | |
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+
1 rows in set
In the example, the log stream with LS_ID of 1003 is the broadcast log stream, and all replicated tables of the tenant are created on this log stream. For more information about broadcast log streams, see Replica introduction.
After a replicated table is created, you can perform insert and read/write operations on it, just like a regular table. The difference is that for read requests, if you connect to the database using a proxy, the read request may be routed to any OBServer node for execution. If you connect to the database directly, the system will execute the read request on the directly connected OBServer node as long as the local replica is readable. For more information about database connection methods, see Overview of connection methods.
Create a new table by replicating data from an existing table
Copy data from a table
You can use the CREATE TABLE AS SELECT statement to copy data from a table. However, the structure may not be completely consistent, and information such as constraints, indexes, default values, and partitions will be lost.
Here is an example statement:
obclient>CREATE TABLE t1_copy AS SELECT * FROM t1;
Query OK, 3 rows affected
Copy the structure of a table
You can use the CREATE TABLE LIKE statement to copy the structure of a table, but you cannot copy the data.
Here is an example statement:
obclient>CREATE TABLE t1_like like t1;
Query OK, 0 rows affected
Create a rowstore table
OceanBase Database supports creating rowstore tables and converting rowstore tables to columnstore tables.
When the configuration parameter default_table_store_format=‘row’ (default value) is set, the default table type is rowstore. If the default_table_store_format parameter is not set to row, you can create a rowstore table by specifying the WITH COLUMN GROUP(all columns) option.
For more information about converting rowstore tables to columnstore tables, see Change a table. For more information about creating columnstore indexes, see Create an index.
You can create a rowstore table by specifying the WITH COLUMN GROUP(all columns) option.
Here is an example:
CREATE TABLE tbl1_cg (col1 INT PRIMARY KEY, col2 VARCHAR(50)) WITH COLUMN GROUP(all columns);
Note
If you specify the WITH COLUMN GROUP(all columns) option, the table is created as a rowstore table. Even if you later execute the DROP COLUMN GROUP(all columns) statement to drop this column group, the table remains a rowstore table.
Create a columnstore table
OceanBase Database supports creating columnstore tables, converting rowstore tables to columnstore tables, and creating columnstore indexes. By default, OceanBase Database creates rowstore tables. You can specify the WITH COLUMN GROUP option to explicitly create a columnstore table or a rowstore-columnstore redundant table.
For more information about converting rowstore tables to columnstore tables, see Change a table. For more information about creating columnstore indexes, see Create an index.
You can use the WITH COLUMN GROUP(all columns, each column) option to create a rowstore/columnstore redundant table.
Here is an example:
CREATE TABLE tbl1_cg (col1 INT PRIMARY KEY, col2 VARCHAR(50)) WITH COLUMN GROUP(all columns, each column);
You can use the WITH COLUMN GROUP(each column) option to create a columnstore table.
Here is an example:
CREATE TABLE tbl2_cg (col1 INT PRIMARY KEY, col2 VARCHAR(50)) WITH COLUMN GROUP(each column);
When you create and use a columnstore table and import a large amount of data, we recommend that you perform a major compaction to improve the read performance and perform statistics collection to adjust the execution strategy.
Major compaction: After you import data in batches, we recommend that you perform a major compaction. This helps improve the read performance because the major compaction organizes fragmented data to make it more continuous in physical storage, thus reducing the disk I/O during reads. After you import data, perform a major compaction in the tenant to ensure that all data is compacted to the baseline layer. For more information, see
MAJOR and MINOR.Statistics collection: After the major compaction is completed, we recommend that you collect statistics. This is important for the optimizer to generate effective query plans and execution strategies. You can execute the GATHER_SCHEMA_STATS statement to collect statistics for all tables. You can also monitor the progress of statistics collection by querying the GV$OB_OPT_STAT_GATHER_MONITOR view.
Note that the speed of a major compaction may decrease as the amount of data in the columnstore table increases.
Create a heap-organized table
OceanBase Database supports both index-organized tables (IOTs) and heap-organized tables.
Limitations
The unique key local index or the primary key of a heap-organized table must include all partitioning columns.
When you create an index on a heap-organized table, the index name must be different from the name of the first column in the primary key.
OceanBase Database does not support the following DDL operations on heap-organized tables:
- Adding or dropping primary key constraints
- Changing the primary key to a non-primary key column
- Changing the type or length of a primary key column from a larger to a smaller size
- Adding a primary key column
- Modifying a primary key column to an auto-increment column
- Dropping a primary key column
- Splitting partitions
Syntax to specify the table organization
To specify the table organization when creating a table, add the ORGANIZATION table option after the CREATE TABLE statement. The SQL statement is as follows:
CREATE TABLE table_name column_definition ORGANIZATION [=] {INDEX | HEAP};
The ORGANIZATION parameter specifies the storage order of data rows in the table. Valid values are as follows:
INDEX: indicates that the table is an index-organized table (IOT).HEAP: indicates that the table is a heap-organized table.INDEX: indicates that the table is an index-organized table (IOT).HEAP: indicates that the table is a heap-organized table.
If you do not specify the ORGANIZATION option, its value is the same as the value of the default_table_organization parameter.
Here are some examples:
Specify the
ORGANIZATIONattribute asHEAPwhen creating a table.CREATE TABLE ora_tbl1 (col1 INT, col2 INT) ORGANIZATION = HEAP;Do not specify the
ORGANIZATIONoption when creating a table.After you change the value of the tenant-level parameter
default_table_organization, you can create a table. For more information about thedefault_table_organizationparameter, see default_table_organization.Note
The
default_table_organizationparameter is supported only for MySQL mode user tenants in OceanBase Database. It is not supported for the sys tenant and Oracle mode user tenants.Set the value of the
default_table_organizationparameter toHEAP.ALTER SYSTEM SET default_table_organization = 'HEAP';Create a table named
ora_tbl2.CREATE TABLE ora_tbl2(col1 INT, col2 INT);View the definition of the
ora_tbl2table.SHOW CREATE TABLE ora_tbl2;The returned result is as follows:
+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ora_tbl2 | CREATE TABLE `ora_tbl2` ( `col1` int(11) DEFAULT NULL, `col2` int(11) DEFAULT NULL ) ORGANIZATION HEAP DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE ENABLE_MACRO_BLOCK_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 | +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set
Specify a clustered column when creating a table
To specify a clustered column when creating a table, add the CLUSTER BY table option after the CREATE TABLE statement. The SQL statement is as follows:
CREATE TABLE table_name (table_definition_list) CLUSTER BY (column_name_list);
column_name_list:
column_name [, column_name ...]
Parameter description:
column_name: the name of the clustered column used for sorting.
Notice
In a MySQL tenant of OceanBase Database, if you do not explicitly specify the table organization (such as ORGANIZATION = HEAP) but define the CLUSTER BY option, OceanBase Database will default to creating a heap-organized table.
Here is an example:
Create a table cb_tbl1 that is sorted by col2 and col4.
obclient> CREATE TABLE cb_tbl1 (
col1 INT PRIMARY KEY,
col2 INT,
col3 DECIMAL(10, 2),
col4 TIMESTAMP
)
CLUSTER BY (col2, col4);
Create a temporary table
A temporary table is a special table at the session level, primarily used for temporarily storing intermediate data.
Note
Starting from OceanBase Database V4.3.5 BP4, you can create temporary tables.
Limitations
Temporary tables are subject to the same load balancing as regular tables. By default, temporary tables cannot be bound to a specific server. You can disable load balancing if needed.
In the serializable isolation level (Serializable), OceanBase Database does not allow write operations on temporary tables created within the same transaction.
If a temporary table with the same name as a regular table exists in the current session, it is not recommended to use the database-level statistics collection feature in the current session, as the results may not meet expectations.
Temporary tables are cleaned up by background logic. To avoid significant impact on business operations, the cleanup speed is limited. Therefore, in scenarios with a large number of temporary tables, it is recommended to manually delete them in a timely manner.
Temporary tables cannot be accessed through DBLinks. Here are the steps:
Execute the following command to modify the communication protocol configuration of OBProxy.
ALTER proxyconfig SET server_protocol = 'OceanBase 2.0';Execute the following command to set the algorithm for generating the client session ID (Client Session ID) to version 2.
ALTER proxyconfig SET client_session_id_version = 2;Execute the following command to change the ODP ID to 1. Different ODPS should have unique IDs to avoid conflicts in the generated Client Session ID.
ALTER proxyconfig SET proxy_id = 1;
Example
Create a temporary table named tbl1:
CREATE TEMPORARY TABLE tbl1(col1 INT);
Specify an update model for a table
To specify a table update model when creating a table, add the MERGE_ENGINE option to the CREATE TABLE statement. The SQL statement is as follows:
CREATE TABLE table_name column_definition
MERGE_ENGINE = {delete_insert | partial_update | append_only};
Note
After you specify the MERGE_ENGINE parameter when you create a table, its configuration value cannot be modified.
MERGE_ENGINE specifies the update model for the table, and the valid values are as follows:
partial_update: Partially updated. Each partial update stores only the modified columns (delta) in the row. This uses less storage but requires merging multiple deltas during query to retrieve the latest value. This mode is suitable for OLTP systems that perform frequent updates and have low query requirements.delete_insert: specifies full row update. Complete rows (delete old rows + insert new rows) are written each time data is updated for prioritizing query performance. This mode supports skip indexes on incremental data (Memtable/Delta SSTable). During queries, filters can be pushed down to incremental data. If the filter results do not require updates to the baseline data, baseline and incremental data can be processed separately, reducing read amplification. This mode is suitable for OLAP scenarios with high incremental data ratios and frequent execution of complex queries or batch processing.append_only: This update mode allows onlyINSERToperations and disallows all DML and DDL operations that change stored data. You can use the Compaction TTL feature in this mode to clean up old data. For more information about the usage limitations of this mode, see Model usage limitations.
Note
delete_insert and partial_update differ in write granularity: the former writes all columns in full for each update, while the latter only writes the modified columns. Full column writes allow delete_insert to support the Skip Index in Delta SSTables, improving filter pushdown performance in analytical queries.
If the MERGE_ENGINE option is not specified, it will take the value of the default_table_merge_engine configuration item.
Use cases
| Scenario | Suggested value | Description |
|---|---|---|
| OLAP, analytical queries, batch processing | delete_insert |
When incremental data can be filtered, it results in improved query performance; however, incremental data will consume more storage space. |
| OLTP, frequent updates, storage-sensitive | partial_update |
More storage-efficient, suitable for frequent updates but with lower query requirements. |
| Immutable, efficient append, simplified query | append_only |
High-throughput scenarios (such as logs, IoT monitoring); highly efficient for range queries and aggregation calculations. |
For more information about the scenarios for configuration, see Best Practices.
Usage restrictions for updated models
If the value of the MERGE_ENGINE table option is set to append_only, there are the following limitations:
The following DML statements are not supported:
UPDATEDELETEMERGE INTOREPLACE
The syntax
INSERT [IGNORE] xxx ON DUPLICATE KEY UPDATEis not disallowed, but it will raise an error when actual data modifications occur.Prohibit the following DDL operations:
TRUNCATEtables and partitions- Data Exchanging
- Drop partitions
- Delete a column
- Reduce column value range
- Dynamic partitions
- Drop obsolete columns
- Add a column (specify
DEFAULT VALUE) - Add an auto-incrementing column
Examples
Create a
mer_tbl1table and set the model todelete_insert.obclient> CREATE TABLE mer_tbl1 (col1 INT, col2 INT) MERGE_ENGINE = delete_insert;Create a column store table
mer_tbl2and set the update model todelete_insertfor full column updates.obclient> CREATE TABLE mer_tbl2 (col1 INT, col2 INT) MERGE_ENGINE = delete_insert WITH COLUMN GROUP(each column);Create a table in
append_onlymode and performINSERT,SELECT,UPDATE, andDELETEoperations on it.Create the
mer_tbl3table and update the model toappend_only.obclient> CREATE TABLE mer_tbl3 (col1 INT PRIMARY KEY, col2 VARCHAR(100)) MERGE_ENGINE = append_only;Insert data into the
mer_tbl3table.obclient> INSERT INTO mer_tbl3 VALUES(1, 'oceanbase');Query data from
mer_tbl3.obclient> SELECT * FROM mer_tbl3;The response is as follows:
+------+-----------+ | col1 | col2 | +------+-----------+ | 1 | oceanbase | +------+-----------+ 1 row in setAn error occurred when updating the
mer_tbl3table.obclient> UPDATE mer_tbl3 SET col2 = 'database' WHERE col1 = 1;The result returned is as follows:
ERROR 1235 (0A000): update append_only table is not supportedAttempt to delete data from the table
mer_tbl3failed.obclient> DELETE FROM mer_tbl3 WHERE col1 = 1;The returned result is as follows:
ERROR 1235 (0A000): delete from append_only table is not supported
