The default character set of OceanBase Database is utf8mb4.
OceanBase Database currently supports the following character sets:
binarygbkgb18030utf16utf8mb4/utf8mb3Note
- For smooth migration, OceanBase Database considers
UTF8as a synonym ofUTF8MB4in terms of syntax. utf8mb3is an alias forutf8mb4.
- For smooth migration, OceanBase Database considers
latin1gb2312gb18030_2022asciitis620ujiseuckreucjpmscp932utf16lesjisdec8hkscshkscs31big5cp850hp8macromanswe7
Note
Starting from OceanBase Database V4.3.5 BP1, the following character sets are supported: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.
Implicit conversion between the gb18030 and gb18030_2022 character sets is not supported in OceanBase Database. However, the CONVERT function can be used to convert a gb18030 string to the gb18030_2022 character set. The conversion is based on the Unicode mapping. For example, the encoding of '龴' remains unchanged at 0xFE59 after the conversion.
obclient [(none)]> SELECT HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)), HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030));
The execution result is as follows:
+--------------------------------------------------+--------------------------------------------------+
| HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)) | HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)) |
+--------------------------------------------------+--------------------------------------------------+
| FE59 | FE59 |
+--------------------------------------------------+--------------------------------------------------+
1 row in set
View the available character sets
Use the SHOW CHARACTER SET statement to view the available character sets.
obclient [(none)]> SHOW CHARSET;
The following example returns these results:
+--------------+---------------------------+-------------------------+--------+
| Charset | Description | Default collation | Maxlen |
+--------------+---------------------------+-------------------------+--------+
| binary | Binary pseudo charset | binary | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| gbk | GBK charset | gbk_chinese_ci | 2 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| gb18030 | GB18030 charset | gb18030_chinese_ci | 4 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gb18030_2022 | GB18030-2022 charset | gb18030_2022_chinese_ci | 4 |
| ascii | US ASCII | ascii_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| sjis | SJIS | sjis_japanese_ci | 2 |
| big5 | BIG5 | big5_chinese_ci | 2 |
| hkscs | HKSCS | hkscs_bin | 2 |
| hkscs31 | HKSCS-ISO UNICODE 31 | hkscs31_bin | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| swe7 | 7bit West European | swe7_swedish_ci | 1 |
+--------------+---------------------------+-------------------------+--------+
24 rows in set
Specify a nondefault character set
OceanBase Database allows you to specify a character set other than the default for communication with the server. For example, to use the gbk character set, execute the following statement after connecting to the server:
SET NAMES gbk;
Note that the SET NAMES statement does not modify the encoding of input characters for the client. For example, if a client uses the SET NAMES statement to set the client encoding to gb18030_2022, it must have already used gb18030_2022 for its input. Otherwise, character encoding errors may occur.
/* The client uses the utf8mb4 character set and creates the table t, which is set to use the utf8mb4 character set by default. */
obclient> CREATE TABLE t(c VARCHAR(100));
Query OK, 0 rows affected
/* Inserting a character with utf8mb4 encoding */
obclient> INSERT INTO t VALUES ('character set');
Query OK, 1 row affected
/* Modifies the character set of the current session without changing the actual character set used by the client */
obclient> SET NAMES gb18030_2022;
Query OK, 0 rows affected
/* Insert characters still using the utf8mb4 character set */
obclient> INSERT INTO t VALUES ('character set');
Query OK, 1 row affected
/* Data in the t table appears garbled. */
obclient> SELECT * FROM t;
+----------+
| c |
+----------+
| �ַ��� |
| Character |
+----------+
2 rows in set
/* Changing the character set of the current session to utf8mb4 */
obclient> SET NAMES utf8mb4;
Query OK, 0 rows affected
/* Query table t again to verify the results. The characters inserted on the second occasion are still garbled. */
obclient> SELECT * FROM t;
+--------------+
| c |
+--------------+
| Character set |
| Statement |
+--------------+
2 rows in set