By default, OceanBase Database uses the utf8mb4 character set.
OceanBase Database supports the following character sets:
binarygbkgb18030utf16utf8mb4/utf8mb3Note
- To support seamless migration, OceanBase Database treats
UTF8as a synonym ofUTF8MB4. utf8mb3is an alias forutf8mb4.
- To support seamless migration, OceanBase Database treats
latin1gb18030gb18030_2022asciitis620utf16lesjisdec8hkscshkscs31big5
The current version of OceanBase Database does not support implicit conversion between gb18030 and gb18030_2022. However, you can explicitly convert the character set of a gb18030 string to gb18030_2022 by using the CONVERT() function. Such conversion does not involve Unicode encoding and decoding. Therefore, the original code is retained. In the following example, the code of '龴' remains 0xFE59 before and after the conversion.
obclient [(none)]> SELECT HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)), HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030));
The return result is as follows:
+--------------------------------------------------+--------------------------------------------------+
| HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)) | HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)) |
+--------------------------------------------------+--------------------------------------------------+
| FE59 | FE59 |
+--------------------------------------------------+--------------------------------------------------+
1 row in set
View available character sets
You can use the SHOW CHARSET statement to view the available character sets.
obclient [(none)]> SHOW CHARSET;
The return result is as follows:
obclient [(none)]> SHOW CHARSET;
+--------------+-----------------------+-------------------------+--------+
| Charset | Description | Default collation | Maxlen |
+--------------+-----------------------+-------------------------+--------+
| binary | Binary pseudo charset | binary | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| gbk | GBK charset | gbk_chinese_ci | 2 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| gb18030 | GB18030 charset | gb18030_chinese_ci | 4 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| gb18030_2022 | GB18030-2022 charset | gb18030_2022_chinese_ci | 4 |
| ascii | US ASCII | ascii_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| sjis | SJIS | sjis_japanese_ci | 2 |
| big5 | BIG5 | big5_chinese_ci | 2 |
| hkscs | HKSCS | hkscs_bin | 2 |
| hkscs31 | HKSCS-ISO UNICODE 31 | hkscs31_bin | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
+--------------+-----------------------+-------------------------+--------+
15 rows in set (0.012 sec)
Specify a non-default character set
OceanBase Database allows you to specify a character set other than the default one for communication with the server. For example, to use the gbk character set, execute the following statement after you connect to the server:
SET NAMES gbk;
Note that the SET NAMES statement does not change the encoding of characters that are input on the client. For example, if you want to use the SET NAMES statement to set client encoding to gb18030_2022, gb18030_2022 encoding must have been used on the client. Otherwise, garbled characters are displayed.
/* Use the utf8mb4 character set on the client and create a table named t with the default character set utf8mb4. */
obclient> CREATE TABLE t(c VARCHAR(100));
Query OK, 0 rows affected
/* Insert utf8mb4-encoded characters. */
obclient> INSERT INTO t VALUES ('Character Set');
Query OK, 1 row affected
/* Change the character set of the current session, but use the original character set on the client. */
obclient> SET NAMES gb18030_2022;
Query OK, 0 rows affected
/* Insert utf8mb4-encoded characters again. */
obclient> INSERT INTO t VALUES ('Character Set');
Query OK, 1 row affected
/* The data in Table t is garbled. */
obclient> SELECT * FROM t;
+----------+
| c |
+----------+
| �ַ��� |
| Character� |
+----------+
2 rows in set
/* Change the character set of the current session to utf8mb4. */
obclient> SET NAMES utf8mb4;
Query OK, 0 rows affected
/* Query the data in Table t. The characters inserted in the second batch are still garbled. */
obclient> SELECT * FROM t;
+--------------+
| c |
+--------------+
| Character Set |
| Exxrrxxor |
+--------------+
2 rows in set