By default, OceanBase Database uses the utf8mb4 character set.
OceanBase Database currently supports the following character sets:
binarygbkgb18030utf16utf8mb4/utf8mb3Considerations
- To support seamless migration, OceanBase Database treats
UTF8as an alias forUTF8MB4. utf8mb3is an alias forutf8mb4.
- To support seamless migration, OceanBase Database treats
latin1gb2312gb18030_2022asciitis620ujiseuckreucjpmscp932utf16lesjisdec8hkscshkscs31big5cp850hp8macromanswe7
Note
In OceanBase Database V4.3.5, the following character sets are supported beginning from V4.3.5 BP1: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.
Implicit conversion from the gb18030 and gb18030_2022 character sets is not supported for the current version of OceanBase Database. However, you can explicitly convert the character set of a gb18030 string to gb18030_2022 using the CONVERT function. This conversion uses the encoding reserved method and does not involve Unicode. The following example shows that the code of the '龴' character is 0xFE59 before and after the conversion. Example: The code of the '龴' character is 0xFE59 before and after the conversion. The CONVERT function can be used to explicitly convert the character set of a gb18030 string to gb18030_2022. This conversion uses the encoding reserved method and does not involve Unicode.
obclient [(none)]> SELECT HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)), HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030));
The result set is as follows:
+--------------------------------------------------+--------------------------------------------------+
| HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)) | HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)) |
+--------------------------------------------------+--------------------------------------------------+
| FE59 | FE59 |
+--------------------------------------------------+--------------------------------------------------+
1 row in set
View available character sets
Use the SHOW CHARSET statement below to view the available character sets.
obclient [(none)]> SHOW CHARSET;
The result returned is as follows:
+--------------+---------------------------+-------------------------+--------+
| Charset | Description | Default collation | Maxlen |
+--------------+---------------------------+-------------------------+--------+
| binary | Binary pseudo charset | binary | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| gbk | GBK charset | gbk_chinese_ci | 2 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| gb18030 | GB18030 charset | gb18030_chinese_ci | 4 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gb18030_2022 | GB18030-2022 charset | gb18030_2022_chinese_ci | 4 |
| ascii | US ASCII | ascii_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| sjis | SJIS | sjis_japanese_ci | 2 |
| big5 | BIG5 | big5_chinese_ci | 2 |
| hkscs | HKSCS | hkscs_bin | 2 |
| hkscs31 | HKSCS-ISO UNICODE 31 | hkscs31_bin | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| swe7 | 7bit West European | swe7_swedish_ci | 1 |
+--------------+---------------------------+-------------------------+--------+
24 rows in set
Specify a nondefault character set
OceanBase Database supports specifying a non-default character set to communicate with the server. For example, to use the gbk character set, execute the following statements after you connect to the server:
SET NAMES gbk;
Note that the SET NAMES statement does not change the encoding used for client input. For example, the client must already be using the gb18030_2022 encoding before you run SET NAMES to configure it to use gb18030_2022 for client input, otherwise garbled text will appear.
/* The client has set the character set to utf8mb4 and created a table named t with the default character set of utf8mb4. */
obclient> CREATE TABLE t(c VARCHAR(100));
Query OK, 0 rows affected
/* inserted character encoded in UTF-8 MB4 */
obclient> INSERT INTO t VALUES ('character set');
Query OK, 1 row affected
/* The character set for the current session is changed, but the character set used by the client is not changed */
obclient> SET NAMES gb18030_2022;
Query OK, 0 rows affected
/* Insert characters in the UTF-8 encoding */
obclient> INSERT INTO t VALUES ('character set');
Query OK, 1 row affected
;/* Query the data in table t, and the data is garbled */
obclient> SELECT * FROM t;
+----------+
| c |
+----------+
| �ַ��� |
| Characters |
+----------+
2 rows in set
/* Change the character set of the current session to utf8mb4. */
obclient> SET NAMES utf8mb4;
Query OK, 0 rows affected
/* Query the data of table t again. The second inserted string is still garbled. */
obclient> SELECT * FROM t;
+--------------+
| c |
+--------------+
| Character set |
| Procedure |
+--------------+
2 rows in set
