The default character set of OceanBase Database is utf8mb4.
OceanBase Database supports the following character sets:
binarygbkgb18030utf16utf8mb4/utf8mb3Considerations
- To support seamless migration, OceanBase Database treats
UTF8as an alias ofUTF8MB4in syntax. utf8mb3is an alias ofutf8mb4.
- To support seamless migration, OceanBase Database treats
latin1gb2312gb18030_2022asciitis620ujiseuckreucjpmscp932utf16lesjisdec8hkscshkscs31big5cp850hp8macromanswe7
Note
For OceanBase Database V4.3.5, the following character sets are supported from version V4.3.5 BP1: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.
The current version of OceanBase Database does not support implicit conversion between the gb18030 and gb18030_2022 character sets. However, you can use the CONVERT function to explicitly convert a string from the gb18030 character set to the gb18030_2022 character set. In this conversion, the code does not go through Unicode and uses an encoding-retaining approach. The following example shows that the code of '龴' is 0xFE59 before and after the conversion.
obclient [(none)]> SELECT HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)), HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030));
The result returned is as follows:
+--------------------------------------------------+--------------------------------------------------+
| HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)) | HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)) |
+--------------------------------------------------+--------------------------------------------------+
| FE59 | FE59 |
+--------------------------------------------------+--------------------------------------------------+
1 row in set
View available character sets
To query available character sets, run the following SHOW CHARSET statement.
obclient [(none)]> SHOW CHARSET;
The following example shows a query result:
+--------------+---------------------------+-------------------------+--------+
| Charset | Description | Default collation | Maxlen |
+--------------+---------------------------+-------------------------+--------+
| binary | Binary pseudo charset | binary | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| gbk | GBK charset | gbk_chinese_ci | 2 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| gb18030 | GB18030 charset | gb18030_chinese_ci | 4 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gb18030_2022 | GB18030-2022 charset | gb18030_2022_chinese_ci | 4 |
| ascii | US ASCII | ascii_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| sjis | SJIS | sjis_japanese_ci | 2 |
| big5 | BIG5 | big5_chinese_ci | 2 |
| hkscs | HKSCS | hkscs_bin | 2 |
| hkscs31 | HKSCS-ISO UNICODE 31 | hkscs31_bin | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| swe7 | 7bit West European | swe7_swedish_ci | 1 |
+--------------+---------------------------+-------------------------+--------+
24 rows in set
Specify a non-default character set
OceanBase Database allows you to specify a character set other than the default one for communication with the server. For example, to use the gbk character set, execute the following statement after connecting to the server:
SET NAMES gbk;
Note that the SET NAMES statement does not change the encoding of characters input by the client. For example, you can configure the client encoding to gb18030_2022 only if the client has already used the gb18030_2022 encoding. Otherwise, it will cause character encoding errors.
/* The client uses the utf8mb4 character set and creates a table named t with the default character set set to utf8mb4. */
obclient> CREATE TABLE t(c VARCHAR(100));
Query OK, 0 rows affected
/* Characters are inserted using the utf8mb4 character set. */
obclient> INSERT INTO t VALUES ('Character set');
Query OK, 1 row affected
The character set of the current session was changed, but the actual character set used by the client remains unchanged.
obclient> SET NAMES gb18030_2022;
Query OK, 0 rows affected
/* Still using the utf8mb4 encoding to insert characters */
obclient> INSERT INTO t VALUES ('Character set');
Query OK, 1 row affected
SELECT * FROM t;
obclient> SELECT * FROM t;
+----------+
| c |
+----------+
| �ַ��� |
| Character |
+----------+
2 rows in set
ALTER SESSION SET CHARACTER SET utf8mb4;
obclient> SET NAMES utf8mb4;
Query OK, 0 rows affected
/* Query the table t again. The characters inserted for the second time remain garbled. */
obclient> SELECT * FROM t;
+--------------+
| c |
+--------------+
| Character set |
| Schema |
+--------------+
2 rows in set