Each "character" column (i.e., a column of type CHAR, VARCHAR, TEXT, or any synonym thereof) has a column character set and a column collation.
Specify the character set and collation for a column
The CREATE TABLE and ALTER TABLE statements allow you to specify the character set and collation for a column. The syntax is as follows:
CREATE TABLE table_name (
column_name {CHAR | VARCHAR | TEXT} (column_length)
[CHARACTER SET charset_name]
[COLLATE collation_name]
);
ALTER TABLE table_name MODIFY
col_name {CHAR | VARCHAR | TEXT} (column_length)
CHARACTER SET [=] charset_name
[COLLATE [=] collation_name]
Here is an example:
obclient> CREATE TABLE t (
col1 VARCHAR(5)
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci);
Query OK, 0 rows affected
obclient> ALTER TABLE t MODIFY
col1 VARCHAR(5)
CHARACTER SET gbk
COLLATE gbk_chinese_ci;
Query OK, 0 rows affected
Choose the character set and collation for a column
OceanBase Database selects the character set and collation for a column as follows:
If both
CHARACTER SET charset_nameandCOLLATE collation_nameare specified, the character setcharset_nameand collationcollation_nameare used.If
CHARACTER SET charset_nameis specified butCOLLATEis not, the character setcharset_nameand its default collation are used. To view the default collation for each character set, use theSHOW CHARACTER SETstatement.If
COLLATE collation_nameis specified butCHARACTER SETis not, the character set associated withcollation_nameand the specified collation are used.If neither
CHARACTER SETnorCOLLATEis specified, the character set and collation of the table are used.
If the character set and collation are not specified for a column in its definition, the character set and collation of the table are used as the default values.
Additionally, in OceanBase Database's MySQL mode, after setting the COLLATE for a column, you can directly use ORDER BY to sort the data according to the specified collation. You can also use COLLATE after ORDER BY to set the same character set with different collations.
obclient> CREATE TABLE t(a VARCHAR(10) COLLATE gb18030_2022_radical_ci);
Query OK, 0 rows affected
obclient> INSERT INTO t VALUES('a'),('A'),('b'),('B');
Query OK, 4 rows affected
Records: 4 Duplicates: 0 Warnings: 0
obclient> SELECT a FROM t ORDER BY a;
+------+
| a |
+------+
| a |
| A |
| b |
| B |
+------+
4 rows in set
obclient> SELECT a FROM t ORDER BY a COLLATE gb18030_2022_chinese_cs;
+------+
| a |
+------+
| A |
| B |
| a |
| b |
+------+
4 rows in set
Conditions for converting column character sets
To convert a binary or non-binary string column to a specified character set, you can use ALTER TABLE. For the conversion to be successful, one of the following conditions must be met:
If the column has a binary data type (
BINARY,VARBINARY, orBLOB), all its values must be encoded using a single character set (i.e., the character set to which the column is being converted). If a binary column is used to store information from multiple character sets, OceanBase Database cannot determine which character set to use for each value, and the data cannot be correctly converted.If the column has a non-binary data type (
CHAR,VARCHAR, orTEXT), its content should be encoded according to the column's character set. If the column content is encoded using different character sets, you can first convert the column to a binary data type and then convert it to a non-binary column with the desired character set.
