The default character set of OceanBase Database is utf8mb4.
OceanBase Database supports the following character sets:
binarygbkgb18030utf16utf8mb4/utf8mb3Note
- To support seamless migration, OceanBase Database treats
UTF8as a synonym forUTF8MB4in its syntax. utf8mb3is a synonym forutf8mb4.
- To support seamless migration, OceanBase Database treats
latin1gb18030_2022asciitis620utf16lesjisdec8hkscshkscs31big5
OceanBase Database does not support implicit conversion between the gb18030 and gb18030_2022 character sets. However, you can use the CONVERT function to convert a string in the gb18030 character set to one in the gb18030_2022 character set. This conversion does not involve Unicode and uses a coding method to preserve the original encoding. For example, the code point of the character '龴' is retained as 0xFE59 before and after the conversion.
obclient [(none)]> SELECT HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)), HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030));
The query result is as follows:
+--------------------------------------------------+--------------------------------------------------+
| HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)) | HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)) |
+--------------------------------------------------+--------------------------------------------------+
| FE59 | FE59 |
+--------------------------------------------------+--------------------------------------------------+
1 row in set
View the character sets that are available
You can execute the SHOW CHARACTER SET statement to view available character sets.
obclient [(none)]> SHOW CHARSET;
The return result is as follows:
obclient [(none)]> SHOW CHARSET;
+--------------+-----------------------+-------------------------+--------+
| Charset | Description | Default collation | Maxlen |
+--------------+-----------------------+-------------------------+--------+
| binary | Binary pseudo charset | binary | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| gbk | GBK charset | gbk_chinese_ci | 2 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| gb18030 | GB18030 charset | gb18030_chinese_ci | 4 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| gb18030_2022 | GB18030-2022 charset | gb18030_2022_chinese_ci | 4 |
| ascii | US ASCII | ascii_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| sjis | SJIS | sjis_japanese_ci | 2 |
| big5 | BIG5 | big5_chinese_ci | 2 |
| hkscs | HKSCS | hkscs_bin | 2 |
| hkscs31 | HKSCS-ISO UNICODE 31 | hkscs31_bin | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
+--------------+-----------------------+-------------------------+--------+
15 rows in set (0.012 sec)
Specifying a nondefault character set
OceanBase Database allows you to use a non-default character set for communication with the server. For example, to use the gbk character set, execute the following statements after connecting to the server:
SET NAMES gbk;
Note that the SET NAMES statement does not change the encoding of characters entered by the client. For example, the client must be using gb18030_2022 before you can configure its encoding with SET NAMES to gb18030_2022, otherwise garbled characters appear.
/* The client uses the utf8mb4 character set and creates a table t with the default character set as utf8mb4 */
obclient> CREATE TABLE t(c VARCHAR(100));
Query OK, 0 rows affected
/* The character set is utf8mb4 */
obclient> INSERT INTO t VALUES ('character set');
Query OK, 1 row affected
/* The character set of the current session has been changed but the character set used by the client has not been changed. */
obclient> SET NAMES gb18030_2022;
Query OK, 0 rows affected
/* Still insert characters using utf8mb4 encoding. */
obclient> INSERT INTO t VALUES ('character_set');
Query OK, 1 row affected
/* Query data from table t, which contains garbled characters */
obclient> SELECT * FROM t;
+----------+
| c |
+----------+
| �ַ��� |
| character |
+----------+
2 rows in set
/* Change the current session character set to utf8mb4 */
obclient> SET NAMES utf8mb4;
Query OK, 0 rows affected
/* Query the data in table t again. The second set of characters is still garbled. */
obclient> SELECT * FROM t;
+--------------+
| c |
+--------------+
| Character set |
| View description |
+--------------+
2 rows in set