Character sets and collations|V4.4.2|OceanBase Database| docs|Distributed Database

Character sets and collations

Last Updated：2026-04-16 05:12:17 Updated

OceanBase Database in Oracle-compatible mode supports the following character sets:

binary
utf8mb4

Note

For smooth migration, OceanBase Database treats UTF8 as a synonym for UTF8MB4 in syntax.
gbk
utf16
gb18030
latin1
gb2312
gb18030_2022
ascii
tis620
ujis
euckr
eucjpms
cp932
utf16le
sjis
hkscs
hkscs31
dec8
big5

Note

The big5 character set is compatible with the big5 character set in MySQL.
cp850
hp8
macroman
swe7

Note

Starting from OceanBase Database V4.3.5 BP1, the following character sets are supported in OceanBase Database V4.3.5: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.

Applicability

OceanBase Connector/J does not support the utf8mb4_unicode_ci and utf16_unicode_ci collations.

The following table lists the collations supported by OceanBase Database in Oracle-compatible mode.

Character set	Collation	Description
utf8mb4_general_ci	utf8mb4	Uses the general collation rule.
utf8mb4_bin	utf8mb4	Uses the binary collation rule.
utf8mb4_unicode_ci	utf8mb4	Uses the Unicode Collation Algorithm (UCA) collation rule.
binary	binary	Uses the binary collation rule.
gbk_chinese_ci	gbk	Uses the Chinese language collation rule.
gbk_bin	gbk	Uses the binary collation rule.
utf16_general_ci	utf16	Uses the general collation rule.
utf16_bin	utf16	Uses the binary collation rule.
utf16_unicode_ci	utf16	Uses the Unicode Collation Algorithm (UCA) collation rule.
gb18030_chinese_ci	gb18030	Uses the Chinese language collation rule.
gb18030_bin	gb18030	Uses the binary collation rule.
latin1_swedish_ci	latin1	Uses the Swedish/Finnish language collation rule.
latin1_german1_ci	latin1	Uses the Latin1 character set for the German language environment.
latin1_danish_ci	latin1	Uses the Latin1 character set for the Danish language environment.
latin1_german2_ci	latin1	Uses the Latin1 character set for the German language environment, suitable for applications requiring dictionary order comparisons.
latin1_general_ci	latin1	Uses the general collation rule, suitable for scenarios requiring case-insensitive comparisons with accents, such as in certain European language databases.
latin1_general_cs	latin1	Uses the case-sensitive general collation rule, supporting multiple languages (such as Western European languages).
latin1_spanish_ci	latin1	Uses the Spanish language collation rule.
latin1_bin	latin1	Uses the binary collation rule.
gb18030_2022_bin	gb18030_2022	Uses the binary collation rule. Default collation for this character set in Oracle-compatible mode.
gb18030_2022_chinese_ci	gb18030_2022	Uses the pinyin collation rule. Case-insensitive.
gb18030_2022_chinese_cs	gb18030_2022	Uses the pinyin collation rule. Case-sensitive.
gb18030_2022_radical_ci	gb18030_2022	Uses the radical-stroke collation rule. Case-insensitive.
gb18030_2022_radical_cs	gb18030_2022	Uses the radical-stroke collation rule. Case-sensitive.
gb18030_2022_stroke_ci	gb18030_2022	Uses the stroke collation rule. Case-insensitive.
gb18030_2022_stroke_cs	gb18030_2022	Uses the stroke collation rule. Case-sensitive.
ascii_bin	ascii	Uses the binary collation rule. It compares and sorts characters as binary data.
ascii_general_ci	ascii	Uses the general case-insensitive collation rule. It ignores case differences and treats uppercase and lowercase letters as the same.
tis620_bin	tis620	Uses the binary collation rule.
tis620_thai_ci	tis620	Uses the Thai language collation rule, case-insensitive.
gb2312_chinese_ci	gb2312	Uses the GB2312 character set with the Chinese language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
gb2312_bin	gb2312	Uses the GB2312 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
ujis_japanese_ci	ujis	Uses the UJIS character set with the Japanese language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
ujis_bin	ujis	Uses the UJIS character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
euckr_korean_ci	euckr	Uses the EUCKR character set with the Korean language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
euckr_bin	euckr	Uses the EUCKR character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
eucjpms_japanese_ci	eucjpms	Uses the EUCJPMS character set with the Japanese language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
eucjpms_bin	eucjpms	Uses the EUCJPMS character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp932_japanese_ci	cp932	Uses the CP932 character set with the Japanese language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp932_bin	cp932	Uses the CP932 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp850_general_ci	cp850	Uses the CP850 character set with the general collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp850_bin	cp850	Uses the CP850 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
hp8_english_ci	hp8	Uses the HP8 character set with the English language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
hp8_bin	hp8	Uses the HP8 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
macroman_general_ci	macroman	Uses the MacRoman character set with the general collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
macroman_bin	macroman	Uses the MacRoman character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
swe7_swedish_ci	swe7	Uses the SWE7 character set with the Swedish language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
swe7_bin	swe7	Uses the SWE7 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.

In Oracle-compatible mode, you can view the database collation for the nls_sort variable from the NLS_DATABASE_PARAMETERS system view. The nls_nchar_characterset variable corresponds to the default character set for the NCHAR, NVARCHAR2, and NCLOB data types. The nls_characterset variable corresponds to the default character set for the CHAR, VARCHAR2, and CLOB data types.

obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER               | VALUE                        |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT         | DD-MON-RR                    |
| NLS_TIMESTAMP_FORMAT    | DD-MON-RR HH.MI.SSXFF AM     |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY           | AMERICA                      |
| NLS_SORT                | BINARY                       |
| NLS_COMP                | BINARY                       |
| NLS_CHARACTERSET        | AL32UTF8                     |
| NLS_NCHAR_CHARACTERSET  | AL16UTF16                    |
| NLS_DATE_LANGUAGE       | AMERICAN                     |
| NLS_LENGTH_SEMANTICS    | BYTE                         |
| NLS_NCHAR_CONV_EXCP     | FALSE                        |
| NLS_CALENDAR            | GREGORIAN                    |
| NLS_NUMERIC_CHARACTERS  | .,                           |
| NLS_CURRENCY            | $                            |
| NLS_ISO_CURRENCY        | AMERICA                      |
| NLS_DUAL_CURRENCY       | $                            |
+-------------------------+------------------------------+
16 rows in set

OceanBase Database allows you to specify a character set that is not the default character set for communications with the server. For example, to use the gbk character set, execute the following statement after connecting to the server:

obclient> SET NAMES gbk;
Query OK, 0 rows affected

By default, the bin collation is used for character sets in OceanBase Database's Oracle-compatible mode. However, other collations can be specified using the NLSSORT() function.

obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected

obclient> INSERT INTO t VALUES ('a'),('A'),('b'),('B'),('[char1]'),('[char2]'),('[char3]'),('[char4]'),('[char5]'),('[char6]'),('ζ');
Query OK, 11 rows affected
Records: 11  Duplicates: 0  Warnings: 0

obclient> SELECT a,NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' ) FROM t ORDER BY NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' );
+------+------------------------------------------+
| A    | NLSSORT(A,'NLS_SORT=SCHINESE_PINYIN2_M') |
+------+------------------------------------------+
| A    | 41                                       |
| B    | 42                                       |
| a    | 61                                       |
| b    | 62                                       |
| ζ    | A6C6                                     |
| Char1 | FFA01016                                 |
| Char2 | FFA01156                                 |
| Char3 | FFA0115E                                 |
| Char4 | FFA04E35                                 |
| Char5 | FFA052DB                                 |
| Char6 | FFA08A0F                                 |
+------+------------------------------------------+
11 rows in set

Conversion between gb18030 and gb18030_2022

Currently, OceanBase Database does not support an implicit conversion from gb18030 to gb18030_2022 or vice versa. However, a user can explicitly convert a gb18030 string to a gb18030_2022 string by using the CONVERT function. This conversion does not pass through Unicode and uses encoding retention. Here is an example:

obclient> CREATE TABLE t1 (c1 VARCHAR(10));
Query OK, 0 rows affected

obclient> INSERT INTO t1 VALUES ('character');
Query OK, 1 row affected

obclient> SELECT RAWTOHEX(c1), RAWTOHEX(CONVERT(c1, 'ZHS32GB18030_2022')) FROM t1;
+--------------+-------------------------------------------+
| RAWTOHEX(C1) | RAWTOHEX(CONVERT(C1,'ZHS32GB18030_2022')) |
+--------------+-------------------------------------------+
| D7D6         | D7D6                                      |
+--------------+-------------------------------------------+
1 row in set

Since the character set is a tenant-level setting in Oracle-compatible mode of OceanBase Database, in a tenant using the gb18030_2022 character set, all CHAR, VARCHAR2, and CLOB fields in user tables use the gb18030_2022 character set and are sorted based on the gb18030_2022_bin collation. In Oracle-compatible mode, string constants are automatically converted to the tenant character set during parsing to ensure character set consistency for SQL statements.

/* In the gb18030_2022 tenant database, the character set of column c is gb18030_2022 and the collation is gb18030_2022_bin. */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected

obclient> INSERT INTO t1 values('Characters');
Query OK, 1 row affected

/* Set the client character set to gb18030 */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected

/* The string literal 'character' is converted to gb18030_2022 when the SQL statement is parsed, so no error is reported during normal execution. */
obclient > SELECT * FROM t1 WHERE c = 'character';
+------+
| C    |
+------+
| |
+------+
1 row in set

References

NLSSORT

nls_sort

nls_characterset

nls_nchar_characterset

Character sets and collations

Last Updated：2026-04-16 05:12:17 Updated

OceanBase Database in Oracle-compatible mode supports the following character sets:

binary
utf8mb4

Note

For smooth migration, OceanBase Database treats UTF8 as a synonym for UTF8MB4 in syntax.
gbk
utf16
gb18030
latin1
gb2312
gb18030_2022
ascii
tis620
ujis
euckr
eucjpms
cp932
utf16le
sjis
hkscs
hkscs31
dec8
big5

Note

The big5 character set is compatible with the big5 character set in MySQL.
cp850
hp8
macroman
swe7

Note

Applicability

OceanBase Connector/J does not support the utf8mb4_unicode_ci and utf16_unicode_ci collations.

The following table lists the collations supported by OceanBase Database in Oracle-compatible mode.

Character set	Collation	Description
utf8mb4_general_ci	utf8mb4	Uses the general collation rule.
utf8mb4_bin	utf8mb4	Uses the binary collation rule.
utf8mb4_unicode_ci	utf8mb4	Uses the Unicode Collation Algorithm (UCA) collation rule.
binary	binary	Uses the binary collation rule.
gbk_chinese_ci	gbk	Uses the Chinese language collation rule.
gbk_bin	gbk	Uses the binary collation rule.
utf16_general_ci	utf16	Uses the general collation rule.
utf16_bin	utf16	Uses the binary collation rule.
utf16_unicode_ci	utf16	Uses the Unicode Collation Algorithm (UCA) collation rule.
gb18030_chinese_ci	gb18030	Uses the Chinese language collation rule.
gb18030_bin	gb18030	Uses the binary collation rule.
latin1_swedish_ci	latin1	Uses the Swedish/Finnish language collation rule.
latin1_german1_ci	latin1	Uses the Latin1 character set for the German language environment.
latin1_danish_ci	latin1	Uses the Latin1 character set for the Danish language environment.
latin1_german2_ci	latin1	Uses the Latin1 character set for the German language environment, suitable for applications requiring dictionary order comparisons.
latin1_general_ci	latin1	Uses the general collation rule, suitable for scenarios requiring case-insensitive comparisons with accents, such as in certain European language databases.
latin1_general_cs	latin1	Uses the case-sensitive general collation rule, supporting multiple languages (such as Western European languages).
latin1_spanish_ci	latin1	Uses the Spanish language collation rule.
latin1_bin	latin1	Uses the binary collation rule.
gb18030_2022_bin	gb18030_2022	Uses the binary collation rule. Default collation for this character set in Oracle-compatible mode.
gb18030_2022_chinese_ci	gb18030_2022	Uses the pinyin collation rule. Case-insensitive.
gb18030_2022_chinese_cs	gb18030_2022	Uses the pinyin collation rule. Case-sensitive.
gb18030_2022_radical_ci	gb18030_2022	Uses the radical-stroke collation rule. Case-insensitive.
gb18030_2022_radical_cs	gb18030_2022	Uses the radical-stroke collation rule. Case-sensitive.
gb18030_2022_stroke_ci	gb18030_2022	Uses the stroke collation rule. Case-insensitive.
gb18030_2022_stroke_cs	gb18030_2022	Uses the stroke collation rule. Case-sensitive.
ascii_bin	ascii	Uses the binary collation rule. It compares and sorts characters as binary data.
ascii_general_ci	ascii	Uses the general case-insensitive collation rule. It ignores case differences and treats uppercase and lowercase letters as the same.
tis620_bin	tis620	Uses the binary collation rule.
tis620_thai_ci	tis620	Uses the Thai language collation rule, case-insensitive.
gb2312_chinese_ci	gb2312	Uses the GB2312 character set with the Chinese language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
gb2312_bin	gb2312	Uses the GB2312 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
ujis_japanese_ci	ujis	Uses the UJIS character set with the Japanese language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
ujis_bin	ujis	Uses the UJIS character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
euckr_korean_ci	euckr	Uses the EUCKR character set with the Korean language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
euckr_bin	euckr	Uses the EUCKR character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
eucjpms_japanese_ci	eucjpms	Uses the EUCJPMS character set with the Japanese language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
eucjpms_bin	eucjpms	Uses the EUCJPMS character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp932_japanese_ci	cp932	Uses the CP932 character set with the Japanese language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp932_bin	cp932	Uses the CP932 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp850_general_ci	cp850	Uses the CP850 character set with the general collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp850_bin	cp850	Uses the CP850 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
hp8_english_ci	hp8	Uses the HP8 character set with the English language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
hp8_bin	hp8	Uses the HP8 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
macroman_general_ci	macroman	Uses the MacRoman character set with the general collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
macroman_bin	macroman	Uses the MacRoman character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
swe7_swedish_ci	swe7	Uses the SWE7 character set with the Swedish language collation rule for case-insensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
swe7_bin	swe7	Uses the SWE7 character set with the binary collation rule for case-sensitive sorting. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.

obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER               | VALUE                        |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT         | DD-MON-RR                    |
| NLS_TIMESTAMP_FORMAT    | DD-MON-RR HH.MI.SSXFF AM     |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY           | AMERICA                      |
| NLS_SORT                | BINARY                       |
| NLS_COMP                | BINARY                       |
| NLS_CHARACTERSET        | AL32UTF8                     |
| NLS_NCHAR_CHARACTERSET  | AL16UTF16                    |
| NLS_DATE_LANGUAGE       | AMERICAN                     |
| NLS_LENGTH_SEMANTICS    | BYTE                         |
| NLS_NCHAR_CONV_EXCP     | FALSE                        |
| NLS_CALENDAR            | GREGORIAN                    |
| NLS_NUMERIC_CHARACTERS  | .,                           |
| NLS_CURRENCY            | $                            |
| NLS_ISO_CURRENCY        | AMERICA                      |
| NLS_DUAL_CURRENCY       | $                            |
+-------------------------+------------------------------+
16 rows in set

obclient> SET NAMES gbk;
Query OK, 0 rows affected

By default, the bin collation is used for character sets in OceanBase Database's Oracle-compatible mode. However, other collations can be specified using the NLSSORT() function.

obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected

obclient> INSERT INTO t VALUES ('a'),('A'),('b'),('B'),('[char1]'),('[char2]'),('[char3]'),('[char4]'),('[char5]'),('[char6]'),('ζ');
Query OK, 11 rows affected
Records: 11  Duplicates: 0  Warnings: 0

obclient> SELECT a,NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' ) FROM t ORDER BY NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' );
+------+------------------------------------------+
| A    | NLSSORT(A,'NLS_SORT=SCHINESE_PINYIN2_M') |
+------+------------------------------------------+
| A    | 41                                       |
| B    | 42                                       |
| a    | 61                                       |
| b    | 62                                       |
| ζ    | A6C6                                     |
| Char1 | FFA01016                                 |
| Char2 | FFA01156                                 |
| Char3 | FFA0115E                                 |
| Char4 | FFA04E35                                 |
| Char5 | FFA052DB                                 |
| Char6 | FFA08A0F                                 |
+------+------------------------------------------+
11 rows in set

Conversion between gb18030 and gb18030_2022

obclient> CREATE TABLE t1 (c1 VARCHAR(10));
Query OK, 0 rows affected

obclient> INSERT INTO t1 VALUES ('character');
Query OK, 1 row affected

obclient> SELECT RAWTOHEX(c1), RAWTOHEX(CONVERT(c1, 'ZHS32GB18030_2022')) FROM t1;
+--------------+-------------------------------------------+
| RAWTOHEX(C1) | RAWTOHEX(CONVERT(C1,'ZHS32GB18030_2022')) |
+--------------+-------------------------------------------+
| D7D6         | D7D6                                      |
+--------------+-------------------------------------------+
1 row in set

/* In the gb18030_2022 tenant database, the character set of column c is gb18030_2022 and the collation is gb18030_2022_bin. */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected

obclient> INSERT INTO t1 values('Characters');
Query OK, 1 row affected

/* Set the client character set to gb18030 */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected

/* The string literal 'character' is converted to gb18030_2022 when the SQL statement is parsed, so no error is reported during normal execution. */
obclient > SELECT * FROM t1 WHERE c = 'character';
+------+
| C    |
+------+
| |
+------+
1 row in set

References

NLSSORT

nls_sort

nls_characterset

nls_nchar_characterset

OceanBase

Customer Stories

Documentation

Character sets and collations

Note

Note

Note

Applicability

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Conversion between gb18030 and gb18030_2022

References

Character sets and collations

Note

Note

Note

Applicability

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Conversion between gb18030 and gb18030_2022

References