By default, OceanBase Database uses the utf8mb4 character set.
OceanBase Database supports the following character sets:
binarygbkgb18030utf16utf8mb4/utf8mb3Note
- To support seamless migration, OceanBase Database takes
utf8as a synonym forutf8mb4. utf8mb3is an alias ofutf8mb4.
- To support seamless migration, OceanBase Database takes
latin1gb2312gb18030_2022asciitis620ujiseuckreucjpmscp932utf16lesjisdec8hkscshkscs31big5cp850hp8macromanswe7
Note
For OceanBase Database V4.3.5, the following character sets are supported starting from V4.3.5 BP1: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.
The current version of OceanBase Database does not support implicit conversion between gb18030 and gb18030_2022. However, you can explicitly convert the character set of a gb18030 string to gb18030_2022 by using the CONVERT() function. Such conversion does not involve Unicode encoding and decoding. That is, the original code is retained. In the following example, the code of '龴' remains 0xFE59 before and after the conversion.
obclient [(none)]> SELECT HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)), HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030));
The return result is as follows:
+--------------------------------------------------+--------------------------------------------------+
| HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)) | HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)) |
+--------------------------------------------------+--------------------------------------------------+
| FE59 | FE59 |
+--------------------------------------------------+--------------------------------------------------+
1 row in set
View available character sets
Use the following SHOW CHARSET statement to view the available character sets.
obclient [(none)]> SHOW CHARSET;
The return result is as follows:
+--------------+---------------------------+-------------------------+--------+
| Charset | Description | Default collation | Maxlen |
+--------------+---------------------------+-------------------------+--------+
| binary | Binary pseudo charset | binary | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| gbk | GBK charset | gbk_chinese_ci | 2 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| gb18030 | GB18030 charset | gb18030_chinese_ci | 4 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gb18030_2022 | GB18030-2022 charset | gb18030_2022_chinese_ci | 4 |
| ascii | US ASCII | ascii_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| sjis | SJIS | sjis_japanese_ci | 2 |
| big5 | BIG5 | big5_chinese_ci | 2 |
| hkscs | HKSCS | hkscs_bin | 2 |
| hkscs31 | HKSCS-ISO UNICODE 31 | hkscs31_bin | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| swe7 | 7bit West European | swe7_swedish_ci | 1 |
+--------------+---------------------------+-------------------------+--------+
24 rows in set
Specify a non-default character set
OceanBase Database allows you to specify a character set other than the default one for communication with the server. For example, to use the gbk character set, execute the following statement after you connect to the server:
SET NAMES gbk;
Note that the SET NAMES statement does not change the encoding of characters that are input on the client. For example, if you want to use the SET NAMES statement to set client encoding to gb18030_2022, gb18030_2022 encoding must have been used on the client. Otherwise, garbled characters are displayed.
/* Use the utf8mb4 character set on the client and create a table named t with the default utf8mb4 character set. */
obclient> CREATE TABLE t(c VARCHAR(100));
Query OK, 0 rows affected
/* Insert utf8mb4-encoded characters. */
obclient> INSERT INTO t VALUES ('Character Set');
Query OK, 1 row affected
/* Change the character set of the current session, but use the original character set on the client. */
obclient> SET NAMES gb18030_2022;
Query OK, 0 rows affected
/* Insert utf8mb4-encoded characters again. */
obclient> INSERT INTO t VALUES ('Character Set');
Query OK, 1 row affected
/* The data in table t is garbled. */
obclient> SELECT * FROM t;
+------------+
| c |
+------------+
| ���� |
| Character�|
+------------+
2 rows in set
/* Change the character set of the current session to utf8mb4. */
obclient> SET NAMES utf8mb4;
Query OK, 0 rows affected
/* Query the data in table t. The characters inserted in the second batch are still garbled. */
obclient> SELECT * FROM t;
+--------------+
| c |
+--------------+
| Character Set|
| Exxrrxxor |
+--------------+
2 rows in set