Character sets and collations|V4.3.5| docs|Distributed Database

Character sets and collations

Last Updated：2026-02-03 08:48:37 Updated

OceanBase Database in Oracle mode supports the following character sets:

binary
utf8mb4

Note

To facilitate seamless migration, OceanBase Database treats UTF8 as a synonym for UTF8MB4 in syntax.
gbk
utf16
gb18030
latin1
gb2312
gb18030_2022
ascii
tis620
ujis
euckr
eucjpms
cp932
utf16le
sjis
hkscs
hkscs31
dec8
big5

Note

The big5 character set is compatible with the big5 character set in MySQL.
cp850
hp8
macroman
swe7

Note

Starting from OceanBase Database V4.3.5 BP1, the following character sets are supported in OceanBase Database V4.3.5: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.

Applicability

OceanBase Connector/J does not support the utf8mb4_unicode_ci and utf16_unicode_ci collations.

The following table lists the collations supported by OceanBase Database in Oracle mode.

Character set	Collation	Description
utf8mb4_general_ci	utf8mb4	Uses the general collation rule.
utf8mb4_bin	utf8mb4	Uses the binary collation rule.
utf8mb4_unicode_ci	utf8mb4	Uses the Unicode Collation Algorithm (UCA) collation rule.
binary	binary	Uses the binary collation rule.
gbk_chinese_ci	gbk	Uses the Chinese collation rule.
gbk_bin	gbk	Uses the binary collation rule.
utf16_general_ci	utf16	Uses the general collation rule.
utf16_bin	utf16	Uses the binary collation rule.
utf16_unicode_ci	utf16	Uses the Unicode Collation Algorithm (UCA) collation rule.
gb18030_chinese_ci	gb18030	Uses the Chinese collation rule.
gb18030_bin	gb18030	Uses the binary collation rule.
latin1_swedish_ci	latin1	Uses the Swedish/Finnish collation rule.
latin1_german1_ci	latin1	Uses the collation rule for the Latin1 character set in the German language environment.
latin1_danish_ci	latin1	Uses the collation rule for the Latin1 character set in the Danish language environment.
latin1_german2_ci	latin1	Uses the collation rule for the German language environment, suitable for applications requiring dictionary order comparisons.
latin1_general_ci	latin1	Uses the general collation rule for case-insensitive sorting with accent support, suitable for databases designed for certain European languages.
latin1_general_cs	latin1	Uses the general collation rule for case-sensitive sorting, supporting multiple languages (such as Western European languages).
latin1_spanish_ci	latin1	Uses the collation rule for the Spanish language environment.
latin1_bin	latin1	Uses the binary collation rule for the Latin1 character set.
gb18030_2022_bin	gb18030_2022	Uses the binary collation rule. Default collation for the gb18030_2022 character set in Oracle mode.
gb18030_2022_chinese_ci	gb18030_2022	Uses the pinyin collation rule. Case-insensitive.
gb18030_2022_chinese_cs	gb18030_2022	Uses the pinyin collation rule. Case-sensitive.
gb18030_2022_radical_ci	gb18030_2022	Uses the radical-stroke collation rule. Case-insensitive.
gb18030_2022_radical_cs	gb18030_2022	Uses the radical-stroke collation rule. Case-sensitive.
gb18030_2022_stroke_ci	gb18030_2022	Uses the stroke collation rule. Case-insensitive.
gb18030_2022_stroke_cs	gb18030_2022	Uses the stroke collation rule. Case-sensitive.
ascii_bin	ascii	Uses the binary collation rule. It compares and sorts characters as binary data.
ascii_general_ci	ascii	Uses the general collation rule for case-insensitive sorting. It ignores case differences and treats uppercase and lowercase letters as the same character.
tis620_bin	tis620	Uses the binary collation rule.
tis620_thai_ci	tis620	Uses the Thai collation rule, case-insensitive.
gb2312_chinese_ci	gb2312	Uses the GB2312 character set with case-insensitive sorting according to the Chinese collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
gb2312_bin	gb2312	Uses the GB2312 character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
ujis_japanese_ci	ujis	Uses the UJIS character set with case-insensitive sorting according to the Japanese collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
ujis_bin	ujis	Uses the UJIS character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
euckr_korean_ci	euckr	Uses the EUCKR character set with case-insensitive sorting according to the Korean collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
euckr_bin	euckr	Uses the EUCKR character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
eucjpms_japanese_ci	eucjpms	Uses the EUCJPMS character set with case-insensitive sorting according to the Japanese collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
eucjpms_bin	eucjpms	Uses the EUCJPMS character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp932_japanese_ci	cp932	Uses the CP932 character set with case-insensitive sorting according to the Japanese collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp932_bin	cp932	Uses the CP932 character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp850_general_ci	cp850	Uses the CP850 character set with case-insensitive sorting according to the general collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
cp850_bin	cp850	Uses the CP850 character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
hp8_english_ci	hp8	Uses the HP8 character set with case-insensitive sorting according to the English collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
hp8_bin	hp8	Uses the HP8 character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
macroman_general_ci	macroman	Uses the MacRoman character set with case-insensitive sorting according to the general collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
macroman_bin	macroman	Uses the MacRoman character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
swe7_swedish_ci	swe7	Uses the SWE7 character set with case-insensitive sorting according to the Swedish collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.
swe7_bin	swe7	Uses the SWE7 character set with case-sensitive sorting according to the binary collation rule. Note For OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1.

In Oracle mode, you can use the NLS_DATABASE_PARAMETERS view to view the system collation corresponding to the nls_sort variable, the default character set of the NCHAR, NVARCHAR2, and NCLOB data types corresponding to the nls_nchar_characterset variable, and the default character set of the CHAR, VARCHAR2, and CLOB data types corresponding to the nls_characterset variable in OceanBase Database.

obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER               | VALUE                        |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT         | DD-MON-RR                    |
| NLS_TIMESTAMP_FORMAT    | DD-MON-RR HH.MI.SSXFF AM     |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY           | AMERICA                      |
| NLS_SORT                | BINARY                       |
| NLS_COMP                | BINARY                       |
| NLS_CHARACTERSET        | AL32UTF8                     |
| NLS_NCHAR_CHARACTERSET  | AL16UTF16                    |
| NLS_DATE_LANGUAGE       | AMERICAN                     |
| NLS_LENGTH_SEMANTICS    | BYTE                         |
| NLS_NCHAR_CONV_EXCP     | FALSE                        |
| NLS_CALENDAR            | GREGORIAN                    |
| NLS_NUMERIC_CHARACTERS  | .,                           |
| NLS_CURRENCY            | $                            |
| NLS_ISO_CURRENCY        | AMERICA                      |
| NLS_DUAL_CURRENCY       | $                            |
+-------------------------+------------------------------+
16 rows in set

OceanBase Database allows you to specify a non-default character set for communication with the server. For example, to use the gbk character set, execute the following statement after you connect to the server:

obclient> SET NAMES gbk;
Query OK, 0 rows affected

The default collation method in OceanBase Database in Oracle mode is bin. However, you can use the NLSSORT() function to specify a different collation method.

obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected

obclient> INSERT INTO t VALUES ('a'),('A'),('b'),('B'),('x'),('y'),('z'),('m'),('n'),('p'),('ζ');
Query OK, 11 rows affected
Records: 11  Duplicates: 0  Warnings: 0

obclient> SELECT a,NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' ) FROM t ORDER BY NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' );
+------+------------------------------------------+
| A    | NLSSORT(A,'NLS_SORT=SCHINESE_PINYIN2_M') |
+------+------------------------------------------+
| A    | 41                                       |
| B    | 42                                       |
| a    | 61                                       |
| b    | 62                                       |
| ζ    | A6C6                                     |
| m    | 6D                                       |
| n    | 6E                                       |
| p    | 70                                       |
| x    | 78                                       |
| y    | 79                                       |
| z    | 7A                                       |
+------+------------------------------------------+
11 rows in set

Conversion between GB 18030 and GB 18030-2022

Implicit conversion from gb18030 to gb18030_2022 is not supported in the current OceanBase Database version. You can explicitly convert a gb18030 string to gb18030_2022 by using the CONVERT function. This conversion does not rely on Unicode but preserves the encoded characters, as shown in the example.

obclient> CREATE TABLE t1 (c1 VARCHAR(10));
Query OK, 0 rows affected

obclient> INSERT INTO t1 VALUES ('Character');
Query OK, 1 row affected

obclient> SELECT RAWTOHEX(c1), RAWTOHEX(CONVERT(c1, 'ZHS32GB18030_2022')) FROM t1;
+--------------+-------------------------------------------+
| RAWTOHEX(C1) | RAWTOHEX(CONVERT(C1,'ZHS32GB18030_2022')) |
+--------------+-------------------------------------------+
| D7D6         | D7D6                                      |
+--------------+-------------------------------------------+
1 row in set

The gb18030_2022 character set in OceanBase Database in Oracle mode is tenant-level. In a tenant with the gb18030_2022 character set, all user tables use the gb18030_2022 character set for the CHAR, VARCHAR2, and CLOB columns, and all of these columns are sorted using the gb18030_2022_bin collation in binary mode. In Oracle mode, string constants are converted to the tenant's character set during parsing to ensure SQL statement consistency.

/* gb18030_2022 tenant with a column c of character set gb18030_2022 and ordering gb18030_2022_bin */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected

obclient> INSERT INTO t1 values('character');
Query OK, 1 row affected

/* Set the client character set to gb18030. */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected

obclient> SELECT * FROM t1 WHERE c = 'character';
+------+
| C    |
+------+
| character |
+------+
1 row in set

References

NLSSORT

nls_sort

nls_characterset

nls_nchar_characterset