By default, OceanBase Database uses the utf8mb4 character set.
OceanBase Database supports the following character sets:
binarygbkgb18030utf16utf8mb4latin1gb18030_2022Applicability
OceanBase Database Community Edition and OceanBase Connector/J do not support
utf8mb4_unicode_ciorutf16_unicode_ci.Note
To support seamless migration, OceanBase Database takes
utf8as a synonym forutf8mb4.The current version of OceanBase Database does not support implicit conversion between
gb18030andgb18030_2022. However, you can explicitly convert the character set of agb18030string togb18030_2022by using theCONVERT()function. Such conversion does not involve Unicode encoding and decoding. That is, the original code is retained. In the following example, the code of '龴' remains0xFE59before and after the conversion.obclient> SELECT HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)), HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)); +--------------------------------------------------+--------------------------------------------------+ | HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)) | HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)) | +--------------------------------------------------+--------------------------------------------------+ | FE59 | FE59 | +--------------------------------------------------+--------------------------------------------------+ 1 row in setUse the following
SHOW CHARACTER SETstatement to view the available character sets.obclient> SHOW CHARACTER SET; +--------------+-----------------------+-------------------------+--------+ | Charset | Description | Default collation | Maxlen | +--------------+-----------------------+-------------------------+--------+ | binary | Binary pseudo charset | binary | 1 | | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | | gbk | GBK charset | gbk_chinese_ci | 2 | | utf16 | UTF-16 Unicode | utf16_general_ci | 2 | | gb18030 | GB18030 charset | gb18030_chinese_ci | 4 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | gb18030_2022 | GB18030-2022 charset | gb18030_2022_chinese_ci | 4 | +--------------+-----------------------+-------------------------+--------+ 7 rows in setOceanBase Database allows you to specify a character set other than the default one for communication with the server. For example, to use the
gbkcharacter set, execute the following statement after you connect to the server:obclient> SET NAMES gbk; Query OK, 0 rows affectedNote that the
SET NAMESstatement does not change the encoding of characters that are input on the client. For example, if you want to use theSET NAMESstatement to set client encoding togb18030_2022,gb18030_2022encoding must have been used on the client. Otherwise, garbled characters are displayed./* Use the utf8mb4 character set on the client and create a table named t with the default utf8mb4 character set. */ obclient> CREATE TABLE t(c VARCHAR(100)); Query OK, 0 rows affected /* Insert utf8mb4-encoded characters. */ obclient> INSERT INTO t VALUES ('Character Set'); Query OK, 1 row affected /* Change the character set of the current session, but use the original character set on the client. */ obclient> SET NAMES gb18030_2022; Query OK, 0 rows affected /* Insert utf8mb4-encoded characters again. */ obclient> INSERT INTO t VALUES ('Character Set'); Query OK, 1 row affected /* The data in table t is garbled. */ obclient> SELECT * FROM t; +----------+ | c | +----------+ | �ַ��� | | Characters� | +----------+ 2 rows in set /* Change the character set of the current session to utf8mb4. */ obclient> SET NAMES utf8mb4; Query OK, 0 rows affected /* Query the data in table t. The characters inserted in the second batch are still garbled. */ obclient> SELECT * FROM t; +--------------+ | c | +--------------+ | Characters | | Exxrrxxor | +--------------+ 2 rows in set