This topic introduces how to create tables using SQL statements, as well as the prerequisites, table overview, requirements to follow, and some examples.
Overview
A table is a collection of two-dimensional arrays used to represent and store the relationships between data objects. By properly designing and using database tables, you can improve data reliability, consistency, and query performance, thereby effectively managing and utilizing data in the database.
For more information about OceanBase Database tables, see Table overview.
Prerequisites
Before you create a table, make sure that the following conditions are met:
You have deployed an OceanBase cluster and created a MySQL tenant. For more information about how to deploy an OceanBase cluster, see Deployment overview.
You have connected to a MySQL tenant of OceanBase Database. For more information about how to connect to the database, see Overview of connection methods.
You have created a database. For more information, see Create a database.
You have the
CREATEprivilege. To view the privileges of the current user, perform the relevant operation described in View user privileges. If you do not have the required privilege, contact the administrator to grant you the necessary privileges. For more information, see Directly grant privileges.
Create a table using the command line
Use the CREATE TABLE statement to create a table.
Note
You can execute the SHOW TABLES; statement to list the tables in the database.
Define table names
When you create a table, you must first name the table. Here are the rules for naming tables:
In MySQL mode of OceanBase Database, each table name must be unique in the database.
The length of a table name cannot exceed 64 characters.
We recommend that you name a table meaningfully. Avoid using uninformative names such as
t1ortable1. For more information, see Table naming conventions.
Example 1: Create a table that stores order information.
Notice
Since no column information is provided, the following SQL statement cannot be executed for the time being.
CREATE TABLE orders (...);
Define columns
In a database, a column records the values of a specific attribute of the data in a table. You can name a column based on the attribute it represents. In addition to the column name, a column also has a data type and the maximum length (precision) of the data type.
Here are the rules for defining columns of a table:
Choose an appropriate data type for the data to be stored in the column based on the characteristics of the data type.
For more information about the data types supported in MySQL mode of OceanBase Database, see Overview of data types.
For string data, we recommend that you use a variable-length string data type and specify the maximum length. Make sure that the specified maximum length is greater than the maximum number of characters to be stored to avoid truncating characters when the length exceeds the specified maximum length.
Determine whether to define a primary key column for the table based on the requirements in Primary key column.
Determine whether to add other constraints to the column based on the requirements in Other constraints.
If a column has the
NOT NULLconstraint, we recommend that you set a default value for the column. If the column type is a date or time type, you can set the default value to the current date and time of the database.
Define a primary key column
A primary key value rule is a rule defined on a key, which ensures that each data row in a table can be uniquely identified by a key value. You can define at most one PRIMARY KEY constraint on a database table. The value of the column or columns that constitute the constraint can uniquely identify a data row. In this way, each data row can be named by the primary key value.
To specify a column as a primary key column, add the PRIMARY KEY keyword after the column definition. If you want to define a primary key constraint on multiple columns, specify the constraint after the list of all column names in the CREATE TABLE statement.
You must pay attention to the following points when you define a primary key column:
We recommend that you define a primary key for each table, with at most one set of primary key columns per table.
In OceanBase Database, you are not required to specify a primary key for a table, but using a primary key ensures that each data row in the table can be uniquely identified and there are no duplicate data rows. If no existing field in the table is suitable to be the primary key, you can create the table without specifying a primary key. Then, the system will automatically generate an auto-increment column as the hidden primary key for the table. For more information about auto-increment columns, see Define an auto-increment column.
In addition, you can use OceanBase Database to add a primary key column to an existing table.
The values of the primary key columns are unique in the entire table.
You can define no more than 64 primary key columns, and the total length of primary key data cannot exceed 16 KB.
The values of the primary key columns cannot be
NULLor empty strings. You must specify values for the primary key columns.We recommend that you specify a name for the primary key constraint. For example, you can name the primary key constraint "PK_xxx".
For more information about primary key constraints, see Primary key constraint.
Example 2: Define a primary key constraint on multiple columns.
obclient> CREATE TABLE test(c1 INT, c2 INT, CONSTRAINT PK_c1_c2 PRIMARY KEY(c1, c2));
Query OK, 0 rows affected
obclient> desc test;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| c1 | int(11) | NO | PRI | NULL | |
| c2 | int(11) | NO | PRI | NULL | |
+-------+---------+------+-----+---------+-------+
2 rows in set
In Example 2, the c1 and c2 columns are defined as the primary key columns, and the constraint name is PK_c1_c2. The values of the c1 and c2 columns cannot be NULL and must be unique.
After you define the primary key columns, you can drop the primary key. For more information about primary key constraints, see Define a constraint type for a column.
Define other column constraints
In addition to the PRIMARY KEY constraint, OceanBase Database supports the NOT NULL constraint, UNIQUE constraint, FOREIGN KEY constraint, and CHECK constraint. You can define constraints to simplify table queries, improve query performance, and ensure that the data is semantically valid.
The following table describes the constraints.
NOT NULLconstraint: The constraint does not allow the values of the specified column to beNULL.For a column that has a
NOT NULLconstraint, you must specify the value of this column in theINSERTstatement. Unless otherwise specified, the default value of this column must not beNULL.UNIQUE constraint: The constraint does not allow duplicate values in the specified column, but multiple
NULLvalues are allowed.FOREIGN KEY constraint: The values in the specified column must be included in the primary key column of another table.
When you create a FOREIGN KEY constraint without specifying a constraint name, the system automatically assigns a constraint name, which is in the format of
table name_OBFK_timestamp. For example,t1_OBFK_1627747200000000.By default, OceanBase Database enables the foreign key constraint check. You can enable or disable the check by setting the tenant variable
foreign_key_checks. For more information about this variable, see foreign_key_checks.CHECKconstraint: The constraint requires that the values in a specified column meet specific conditions.You can define one or more
CHECKconstraints for a single column to allow the column to contain only specific values, or you can define a table-levelCHECKconstraint to apply aCHECKconstraint to multiple columns. When you modify the name of a table, the name of anyCHECKconstraints on the table is not modified. When you drop a table, anyCHECKconstraints on the table are also dropped.When you create a
CHECKconstraint without specifying a constraint name, the system automatically assigns a constraint name, which is in the format oftable name_OBCHECK_timestamp. For example,t1_OBCHECK_1629350823880271.
To define a constraint for a single column, add a constraint keyword to the definition of the column. To define constraints for multiple columns, add the definition of the constraint after the list of all columns in the CREATE TABLE statement.
The following requirements apply when you define other column constraints:
We recommend that you define the
NOT NULLconstraint for columns that do not containNULLvalues.If you want to reference values in another table, define a FOREIGN KEY constraint.
A composite primary key cannot be used as a foreign key.
If you want to prevent duplicate values in a column, define a UNIQUE constraint.
We recommend that you specify the names of other constraints. For example, you can name a UNIQUE constraint "UNI_xxx" and a FOREIGN KEY constraint "FK_xxx".
Example 3: Create a table named tbl1 and set the col1 column to be not null.
obclient> CREATE TABLE tbl1(col1 INT NOT NULL,col2 INT);
Query OK, 0 rows affected
obclient> DESC tbl1;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| col1 | int(11) | NO | | NULL | |
| col2 | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
2 rows in set
In Example 3, when you insert data into the col1 column, the value cannot be NULL.
Example 4: Create a table named tbl2 and set the col1 column to be unique.
obclient> CREATE TABLE tbl2(col1 INT UNIQUE,col2 INT);
Query OK, 0 rows affected
obclient> desc tbl2;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| col1 | int(11) | YES | UNI | NULL | |
| col2 | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
2 rows in set
In Example 4, duplicate values are not allowed in the col1 column.
Example 5: Create a FOREIGN KEY constraint.
obclient> CREATE TABLE test(c1 INT, c2 INT, CONSTRAINT PK_c1 PRIMARY KEY(c1));
Query OK, 0 rows affected
obclient> CREATE TABLE tbl3(col1 INT PRIMARY KEY,col2 INT,CONSTRAINT FK_col2 FOREIGN KEY(col2) REFERENCES test(c1));
Query OK, 0 rows affected
obclient> SELECT * FROM information_schema.TABLE_CONSTRAINTS;
+--------------------+-------------------+-------------------------------+--------------+------------+-----------------+----------+
| CONSTRAINT_CATALOG | CONSTRAINT_SCHEMA | CONSTRAINT_NAME | TABLE_SCHEMA | TABLE_NAME | CONSTRAINT_TYPE | ENFORCED |
+--------------------+-------------------+-------------------------------+--------------+------------+-----------------+----------+
| def | xxx | PRIMARY | xxx | test | PRIMARY KEY | YES |
| def | xxx | PRIMARY | xxx | tbl3 | PRIMARY KEY | YES |
| def | xxx | FK_col2 | xxx | tbl3 | FOREIGN KEY | YES |
+--------------------+-------------------+-------------------------------+--------------+------------+-----------------+----------+
3 rows in set
In Example 5, the col2 column of the tbl3 table is associated with the c1 column, which is the primary key column of the test table. You can query the information_schema.TABLE_CONSTRAINTS view to view the constraint after the constraint is created.
Example 6: Create a table named tbl4 and set the values in the col1 column to be greater than 10.
obclient> CREATE TABLE tbl4(col1 INT CHECK(col1>10),col2 INT);
Query OK, 0 rows affected
obclient> INSERT INTO tbl4 VALUES(2,2);
ERROR 3819 (HY000): check constraint violated
obclient> INSERT INTO tbl4 VALUES(11,2);
Query OK, 1 row affected
In Example 6, the col1 column has a CHECK constraint. If you try to insert a value less than or equal to 10 into the col1 column, an error is returned.
For more information about how to define other column constraints, see Define a constraint for a column.
Define an auto-increment column
In OceanBase Database, if you want a numeric column to have non-repeated values that increase sequentially, you can define the column as AUTO_INCREMENT.
Every AUTO_INCREMENT column has three important attributes: start value, increment, and cache size. These attributes are controlled by the auto_increment_cache_size, auto_increment_increment, and auto_increment_offset tenant variables.
| Variable | Description |
|---|---|
| auto_increment_cache_size | Global variable that specifies the number of cached auto-increment values. The value range is [1, 100000000]. The default value is 1000000. |
| auto_increment_increment | Session variable that specifies the increment. The value range is [1, 65535]. The default value is 1. |
| auto_increment_offset | Session variable that specifies the start value of an AUTO_INCREMENT column. The value range is [1, 65535]. The default value is 1. |
You can modify the preceding three system variables as needed for your business. For more information about how to modify a system variable, see Overview of configuration items and system variables.
The following rules apply to auto-increment columns:
AUTO_INCREMENTis an attribute of a data column and applies only to integer columns.The data column that has the
AUTO_INCREMENTattribute must have theNOT NULLattribute.When you create a partitioned table, if the auto-increment column is used as the partitioning key, each value in the auto-increment column is globally unique but may not be incremental within a partition.
If you create an auto-increment column and then insert data by using the INSERT statement, if the system variable SQL_MODE is not set to NO_AUTO_VALUE_ON_ZERO, and if the value of the auto-increment column is 0, the system fills the column with the next value in the sequence. If the value that you specify is smaller than the current maximum value, it does not affect the calculation of the next value in the sequence. If the value that you specify is greater than the current maximum value, the sum of the inserted value and the cached value in the auto-increment column is used as the starting value for the next increment.
Note
If the value of the SQL_MODE system variable is NO_AUTO_VALUE_ON_ZERO, no AUTO_INCREMENT value is generated for a column to which 0 is inserted.
Example 7: Create a table that contains an auto-increment column.
obclient> CREATE TABLE personal_info(id bigint NOT NULL AUTO_INCREMENT PRIMARY KEY, name varchar(50), gmt_create timestamp NOT NULL default current_timestamp);
Query OK, 0 rows affected
In Example 7, because the id column is defined as an auto-increment column, you do not need to specify the value of this column when you use the INSERT statement to insert data. The system automatically assigns values to this column, as shown in the following example.
obclient> INSERT INTO personal_info(name) VALUES('A'),('B'),('C');
Query OK, 3 rows affected
Records: 3 Duplicates: 0 Warnings: 0
obclient> SELECT * FROM personal_info;
+----+------+---------------------+
| id | name | gmt_create |
+----+------+---------------------+
| 1 | A | 2020-04-03 17:09:55 |
| 2 | B | 2020-04-03 17:09:55 |
| 3 | C | 2020-04-03 17:09:55 |
+----+------+---------------------+
3 rows in set
For more information about auto-increment columns, see Define an auto-increment column.
Choose a partitioning scheme
You must specify a partitioning scheme when you create a table. If the table contains a large amount of data, we recommend that you partition the table. When you create a partitioned table, you must choose an appropriate partitioning method based on the data to be stored in the table.
In the MySQL mode of OceanBase Database, the tenant-level parameter max_partition_num specifies the maximum number of partitions supported for a single table. The default value is 8,192.
In the MySQL mode of OceanBase Database, partitioned tables can be classified based on the partitioning strategy as follows:
RANGE partitioning and RANGE COLUMNS partitioning
LIST partitioning and LIST COLUMNS partitioning
HASH partitioning and KEY partitioning
Composite partitioning
Partitioned tables can also be classified based on the partitioning dimension as follows: the parent tables contain only one partitioning key and the child tables contain two partitioning keys. The two levels of partitioning can use different strategies. In OceanBase Database, the child tables can be template-based or non-template-based.
For more information about partitions, see Partition overview.
RANGE partitioning and RANGE COLUMNS partitioning
In RANGE partitioning and RANGE COLUMNS partitioning, partitions are divided based on the range of the partitioning key in each partition. These two methods are suitable for queries that require a range for the partitioning key. For example, you can partition data by using the time field or price range.
The differences between RANGE partitioning and RANGE COLUMNS partitioning are as follows:
In RANGE partitioning, the partitioning key must be of the integer type. If you want to partition data by using a date field, you must convert the field by using a function. For example, if you want to partition data by using the date field, you must convert the field by using the
YEAR()function. In RANGE COLUMNS partitioning, the partitioning key does not necessarily need to be of the integer type and can be of any type.In RANGE partitioning, you can use an expression for the partitioning key, but the partitioning key cannot be a column vector, such as
partition by range(c1, c2). In RANGE COLUMNS partitioning, you cannot use an expression for the partitioning key, but you can use multiple columns (column vectors).
You define each partition by using the VALUES LESS THAN(value) clause, where the value of value must be a continuous and non-overlapping integer that increases.
Example 8: Create a RANGE COLUMNS partitioned table.
obclient> CREATE TABLE tb1_rc(col1 INT,col2 INT)
PARTITION BY RANGE COLUMNS(col1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200),
PARTITION p2 VALUES LESS THAN(300)
);
Query OK, 0 rows affected
In Example 8, the partitioning key for RANGE COLUMNS partitioning can be of any type. Therefore, you can use the column name col1 as the partitioning key. The table tb1_rc will be partitioned based on the value range of 100, 200, 300. p0, p1, and p2 are the specified partition names, which can be defined as you want, provided that each partition has a unique name within the same table.
LIST partitioning and LIST COLUMNS partitioning
In RANGE partitioning, partitions are divided based on specific values, and the values in each partition do not overlap. The advantage of LIST partitioning is that it allows you to partition unordered and unrelated data.
If you want to partition data by using multiple columns or data of other types, you can use LIST COLUMNS partitioning. LIST COLUMNS partitioning is an extension of LIST partitioning. It supports multiple partitioning keys and INT data, DATE, and DATETIME types.
The differences between LIST partitioning and LIST COLUMNS partitioning are as follows:
In LIST partitioning, the partitioning key must be of the integer type, but this restriction does not apply to LIST COLUMNS partitioning. The partitioning key for LIST COLUMNS partitioning can be of any type.
LIST partitioning supports only one partitioning key. The partitioning key can be one column or an expression. In LIST COLUMNS partitioning, you cannot use an expression as the partitioning key, but you can use multiple columns (column vectors) as the partitioning key.
You define each partition by using the VALUES IN(value_list) clause.
Example 9: Create a LIST partitioned table.
obclient> CREATE TABLE tbl2_l (col1 INT,col2 DATE)
PARTITION BY LIST(col1)
(PARTITION p0 VALUES IN (100),
PARTITION p1 VALUES IN (200)
);
Query OK, 0 rows affected
In Example 9, the partitioning key for LIST partitioning must be of the integer type. Therefore, you can use the column col1 as the partitioning key. The table tbl2_l will be partitioned based on the value range of 100, 200. The partitions are named p0, p1, and so on, which are generated according to the naming rules.
HASH partitioning and KEY partitioning
To use HASH partitioning, you must specify the partitioning key and the number of partitions. The system calculates an integer by using the HASH partitioning expression, and then determines which partition a specific row belongs to based on the result and the number of partitions.
KEY partitioning is similar to HASH partitioning in that both determine which partition a specific row belongs to based on the result of dividing the partitioning key by the number of partitions. The difference is that the system applies an internal default HASH function to the partitioning key for KEY partitioning before dividing it by the number of partitions. Therefore, you cannot usually determine which partition a specific row belongs to by simply performing the calculation.
The differences between KEY partitioning and HASH partitioning are as follows:
In HASH partitioning, the partitioning key must be of the integer type, but the partitioning key for KEY partitioning can be of a character type.
In HASH partitioning, you can use an expression as the partitioning key, but you cannot use an expression as the partitioning key for KEY partitioning.
Example 10: Create a HASH partitioned table named tbl3_h.
obclient> CREATE TABLE tbl3_h(col1 INT,col2 VARCHAR(50))
PARTITION BY HASH(col1) PARTITIONS 2;
Query OK, 0 rows affected
In Example 10, the partitioning key for HASH partitioning must be of the integer type. Therefore, you can use the column col1 as the partitioning key and partition the table tbl3_h into two partitions. When you create the table, you do not need to specify the partition names. The system will name the partitions based on the naming rules. In this example, the partitions are named p0, p1, ..., pn.
Composite partitioning (secondary partitioning)
Composite partitioning is usually implemented by using two different partitioning strategies: one for partitioning and the other for subpartitioning. It is suitable for business tables that contain a large amount of data. It can combine the advantages of multiple partitioning strategies.
RANGE partitioning, RANGE COLUMNS partitioning, LIST partitioning, LIST COLUMNS partitioning, HASH partitioning, and KEY partitioning can be used for subpartitioning. In OceanBase Database, template-based and non-template-based child tables are supported.
The following examples will help you understand how to create a table with subpartitions.
Example 11: Create a template-based RANGE COLUMNS partitioned and RANGE partitioned table.
obclient> CREATE TABLE tb1_m_rcr(col1 INT,col2 INT)
PARTITION BY RANGE COLUMNS(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION mp0 VALUES LESS THAN(3),
SUBPARTITION mp1 VALUES LESS THAN(6),
SUBPARTITION mp2 VALUES LESS THAN(9)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200),
PARTITION p2 VALUES LESS THAN(300)
);
Query OK, 0 rows affected
obclient> SELECT table_name,partition_name,subpartition_name FROM information_schema.partitions;
+------------+----------------+-------------------+
| table_name | partition_name | subpartition_name |
+------------+----------------+-------------------+
| tb1_m_rcr | p0 | p0smp0 |
| tb1_m_rcr | p0 | p0smp1 |
| tb1_m_rcr | p0 | p0smp2 |
| tb1_m_rcr | p1 | p1smp0 |
| tb1_m_rcr | p1 | p1smp1 |
| tb1_m_rcr | p1 | p1smp2 |
| tb1_m_rcr | p2 | p2smp0 |
| tb1_m_rcr | p2 | p2smp1 |
| tb1_m_rcr | p2 | p2smp2 |
+------------+----------------+-------------------+
9 rows in set
In Example 11, the partitioning key for RANGE partitioning must be of the integer type. Therefore, you can use the column name col2 as the partitioning key for subpartitioning. You can use the SUBPARTITION TEMPLATE keyword to create a template-based child table. In a template-based child table, each partition is divided into subpartitions based on the template, that is, the subpartition definitions are the same for each partition. In this example, the data is first partitioned by using RANGE COLUMNS partitioning, and then each partition is divided into subpartitions by using RANGE partitioning.
In addition, when you create a template-based child table, you do not need to specify the partition names for subpartitions. The system will name them based on the naming rules. The naming rule for subpartitions is ($part_name)s($subpart_name). You can see that in this example, the subpartitions of the partition p0 are named p0smp0, p0smp1, and p0smp2.
Notice
For HASH/KEY partitioning, if you use the SUBPARTITIONS n method (for example, SUBPARTITIONS 5) for secondary partitioning, you do not need to use the SUBPARTITION TEMPLATE keyword when you create a template-based child table.
Example 12: Create a non-template-based LIST partitioned and LIST COLUMNS partitioned table.
obclient> CREATE TABLE tbl2_f_llc(col1 INT,col2 DATE)
PARTITION BY LIST(col1)
SUBPARTITION BY LIST COLUMNS(col2)
(PARTITION p0 VALUES IN(100)
(SUBPARTITION sp0 VALUES IN('2021/04/01'),
SUBPARTITION sp1 VALUES IN('2021/07/01'),
SUBPARTITION sp2 VALUES IN('2021/10/01'),
SUBPARTITION sp3 VALUES IN('2022/01/01')
),
PARTITION p1 VALUES IN(200)
(SUBPARTITION sp4 VALUES IN('2021/04/01'),
SUBPARTITION sp5 VALUES IN('2021/07/01'),
SUBPARTITION sp6 VALUES IN('2021/10/01'),
SUBPARTITION sp7 VALUES IN('2022/01/01')
)
);
Query OK, 0 rows affected
In Example 12, you need to define the subpartitions for each partition of the parent table. The definitions of subpartitions can be the same or different in each partition of the parent table.
Example 13: Create a non-template-based HASH partitioned and KEY partitioned table.
obclient> CREATE TABLE tbl3_f_hk (col1 INT,col2 VARCHAR(50))
PARTITION BY HASH(col1)
SUBPARTITION BY KEY(col2)
(PARTITION p1
(SUBPARTITION sp0
,SUBPARTITION sp1
,SUBPARTITION sp2
,SUBPARTITION sp3
),
PARTITION p2
(SUBPARTITION sp4
,SUBPARTITION sp5
,SUBPARTITION sp6
,SUBPARTITION sp7
)
);
Query OK, 0 rows affected
In Example 13, the partitioning key for KEY partitioning can be a character column. Therefore, you can use the column col2 as the partitioning key for subpartitioning. sp0 to sp7 are the specified subpartition names.
Create a replicated table
A replicated table is a special type of table in OceanBase Database. This table can read the latest data modifications from any "healthy" replica. Therefore, replicated tables are a good choice for scenarios with low write frequency and high read frequency.
For more information about replicated tables, see the Create a replicated table section in Create a table.
Notice
Only user tenants can create replicated tables. A replicated table cannot be created in the sys tenant.
The SQL syntax for creating a replicated table is as follows:
CREATE TABLE table_name column_definition DUPLICATE_SCOPE='none | cluster';
The parameters are described as follows:
table_name: the name of the table.column_definition: the column information of the table. For example, the column definitions and primary key definitions.DUPLICATE_SCOPE: the attribute of the replicated table. Valid values:noneandcluster. The meanings of the two values are described as follows:none: indicates that the table is a normal table.cluster: indicates that the table is a replicated table. The leader needs to replicate transactions to all fast replicas and read replicas of the current tenant.
Example 14: Execute the following SQL statement to create a replicated table named test_tbl14.
CREATE TABLE test_tbl14 (col1 INT,col2 INT) DUPLICATE_SCOPE= 'cluster';
What to do next
After you create a table, you may want to perform the following tasks to improve your use and management of the table:
After you create a table, you can use the
INSERTstatement to insert data into the table. For more information, see Insert data.To improve query performance, you can create indexes on the columns of the table. For more information, see Create an index.
References
- For more information about how to view table attributes, see View the definition of a table.
- For more information about how to drop a table, see Drop a table.
- For more information about how to modify a table schema, see Change a table.
- For more information about how to create a partitioned table, see Create a partitioned table.