OceanBase Database in Oracle-compatible mode supports the following character sets:
binary
utf8mb4
Note
For smooth migration, OceanBase Database treats
UTF8as a synonym forUTF8MB4in syntax.gbk
utf16
gb18030
latin1
gb2312
gb18030_2022
ascii
tis620
ujis
euckr
eucjpms
cp932
utf16le
sjis
hkscs
hkscs31
dec8
big5
Note
The
big5character set is compatible with thebig5character set in MySQL.cp850
hp8
macroman
swe7
Note
Starting from OceanBase Database V4.3.5 BP1, the following character sets are supported in OceanBase Database V4.3.5: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.
Applicability
OceanBase Connector/J does not support the utf8mb4_unicode_ci and utf16_unicode_ci collations.
The following table lists the collations supported by OceanBase Database in Oracle-compatible mode.
| Character set | Collation | Description |
|---|---|---|
| utf8mb4_general_ci | utf8mb4 | Uses the general collation rule. |
| utf8mb4_bin | utf8mb4 | Uses the binary collation rule. |
| utf8mb4_unicode_ci | utf8mb4 | Uses the Unicode Collation Algorithm (UCA) collation rule. |
| binary | binary | Uses the binary collation rule. |
| gbk_chinese_ci | gbk | Uses the Chinese language collation rule. |
| gbk_bin | gbk | Uses the binary collation rule. |
| utf16_general_ci | utf16 | Uses the general collation rule. |
| utf16_bin | utf16 | Uses the binary collation rule. |
| utf16_unicode_ci | utf16 | Uses the Unicode Collation Algorithm (UCA) collation rule. |
| gb18030_chinese_ci | gb18030 | Uses the Chinese language collation rule. |
| gb18030_bin | gb18030 | Uses the binary collation rule. |
| latin1_swedish_ci | latin1 | Uses the Swedish/Finnish language collation rule. |
| latin1_german1_ci | latin1 | Uses the Latin1 character set for the German language environment. |
| latin1_danish_ci | latin1 | Uses the Latin1 character set for the Danish language environment. |
| latin1_german2_ci | latin1 | Uses the Latin1 character set for the German language environment, suitable for applications requiring dictionary order comparisons. |
| latin1_general_ci | latin1 | Uses the general collation rule, suitable for scenarios requiring case-insensitive comparisons with accents, such as in certain European language databases. |
| latin1_general_cs | latin1 | Uses the case-sensitive general collation rule, supporting multiple languages (such as Western European languages). |
| latin1_spanish_ci | latin1 | Uses the Spanish language collation rule. |
| latin1_bin | latin1 | Uses the binary collation rule. |
| gb18030_2022_bin | gb18030_2022 | Uses the binary collation rule. Default collation for this character set in Oracle-compatible mode. |
| gb18030_2022_chinese_ci | gb18030_2022 | Uses the pinyin collation rule. Case-insensitive. |
| gb18030_2022_chinese_cs | gb18030_2022 | Uses the pinyin collation rule. Case-sensitive. |
| gb18030_2022_radical_ci | gb18030_2022 | Uses the radical-stroke collation rule. Case-insensitive. |
| gb18030_2022_radical_cs | gb18030_2022 | Uses the radical-stroke collation rule. Case-sensitive. |
| gb18030_2022_stroke_ci | gb18030_2022 | Uses the stroke collation rule. Case-insensitive. |
| gb18030_2022_stroke_cs | gb18030_2022 | Uses the stroke collation rule. Case-sensitive. |
| ascii_bin | ascii | Uses the binary collation rule. It compares and sorts characters as binary data. |
| ascii_general_ci | ascii | Uses the general case-insensitive collation rule. It ignores case differences and treats uppercase and lowercase letters as the same. |
| tis620_bin | tis620 | Uses the binary collation rule. |
| tis620_thai_ci | tis620 | Uses the Thai language collation rule, case-insensitive. |
| gb2312_chinese_ci | gb2312 | Uses the GB2312 character set with the Chinese language collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| gb2312_bin | gb2312 | Uses the GB2312 character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| ujis_japanese_ci | ujis | Uses the UJIS character set with the Japanese language collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| ujis_bin | ujis | Uses the UJIS character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| euckr_korean_ci | euckr | Uses the EUCKR character set with the Korean language collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| euckr_bin | euckr | Uses the EUCKR character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| eucjpms_japanese_ci | eucjpms | Uses the EUCJPMS character set with the Japanese language collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| eucjpms_bin | eucjpms | Uses the EUCJPMS character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| cp932_japanese_ci | cp932 | Uses the CP932 character set with the Japanese language collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| cp932_bin | cp932 | Uses the CP932 character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| cp850_general_ci | cp850 | Uses the CP850 character set with the general collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| cp850_bin | cp850 | Uses the CP850 character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| hp8_english_ci | hp8 | Uses the HP8 character set with the English language collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| hp8_bin | hp8 | Uses the HP8 character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| macroman_general_ci | macroman | Uses the MacRoman character set with the general collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| macroman_bin | macroman | Uses the MacRoman character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| swe7_swedish_ci | swe7 | Uses the SWE7 character set with the Swedish language collation rule for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| swe7_bin | swe7 | Uses the SWE7 character set with the binary collation rule for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
In Oracle-compatible mode, you can view the database collation for the nls_sort variable from the NLS_DATABASE_PARAMETERS system view. The nls_nchar_characterset variable corresponds to the default character set for the NCHAR, NVARCHAR2, and NCLOB data types. The nls_characterset variable corresponds to the default character set for the CHAR, VARCHAR2, and CLOB data types.
obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER | VALUE |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT | DD-MON-RR |
| NLS_TIMESTAMP_FORMAT | DD-MON-RR HH.MI.SSXFF AM |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY | AMERICA |
| NLS_SORT | BINARY |
| NLS_COMP | BINARY |
| NLS_CHARACTERSET | AL32UTF8 |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 |
| NLS_DATE_LANGUAGE | AMERICAN |
| NLS_LENGTH_SEMANTICS | BYTE |
| NLS_NCHAR_CONV_EXCP | FALSE |
| NLS_CALENDAR | GREGORIAN |
| NLS_NUMERIC_CHARACTERS | ., |
| NLS_CURRENCY | $ |
| NLS_ISO_CURRENCY | AMERICA |
| NLS_DUAL_CURRENCY | $ |
+-------------------------+------------------------------+
16 rows in set
OceanBase Database allows you to specify a character set that is not the default character set for communications with the server. For example, to use the gbk character set, execute the following statement after connecting to the server:
obclient> SET NAMES gbk;
Query OK, 0 rows affected
By default, the bin collation is used for character sets in OceanBase Database's Oracle-compatible mode. However, other collations can be specified using the NLSSORT() function.
obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t VALUES ('a'),('A'),('b'),('B'),('[char1]'),('[char2]'),('[char3]'),('[char4]'),('[char5]'),('[char6]'),('ζ');
Query OK, 11 rows affected
Records: 11 Duplicates: 0 Warnings: 0
obclient> SELECT a,NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' ) FROM t ORDER BY NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' );
+------+------------------------------------------+
| A | NLSSORT(A,'NLS_SORT=SCHINESE_PINYIN2_M') |
+------+------------------------------------------+
| A | 41 |
| B | 42 |
| a | 61 |
| b | 62 |
| ζ | A6C6 |
| Char1 | FFA01016 |
| Char2 | FFA01156 |
| Char3 | FFA0115E |
| Char4 | FFA04E35 |
| Char5 | FFA052DB |
| Char6 | FFA08A0F |
+------+------------------------------------------+
11 rows in set
Conversion between gb18030 and gb18030_2022
Currently, OceanBase Database does not support an implicit conversion from gb18030 to gb18030_2022 or vice versa. However, a user can explicitly convert a gb18030 string to a gb18030_2022 string by using the CONVERT function. This conversion does not pass through Unicode and uses encoding retention. Here is an example:
obclient> CREATE TABLE t1 (c1 VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES ('character');
Query OK, 1 row affected
obclient> SELECT RAWTOHEX(c1), RAWTOHEX(CONVERT(c1, 'ZHS32GB18030_2022')) FROM t1;
+--------------+-------------------------------------------+
| RAWTOHEX(C1) | RAWTOHEX(CONVERT(C1,'ZHS32GB18030_2022')) |
+--------------+-------------------------------------------+
| D7D6 | D7D6 |
+--------------+-------------------------------------------+
1 row in set
Since the character set is a tenant-level setting in Oracle-compatible mode of OceanBase Database, in a tenant using the gb18030_2022 character set, all CHAR, VARCHAR2, and CLOB fields in user tables use the gb18030_2022 character set and are sorted based on the gb18030_2022_bin collation. In Oracle-compatible mode, string constants are automatically converted to the tenant character set during parsing to ensure character set consistency for SQL statements.
/* In the gb18030_2022 tenant database, the character set of column c is gb18030_2022 and the collation is gb18030_2022_bin. */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected
obclient> INSERT INTO t1 values('Characters');
Query OK, 1 row affected
/* Set the client character set to gb18030 */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected
/* The string literal 'character' is converted to gb18030_2022 when the SQL statement is parsed, so no error is reported during normal execution. */
obclient > SELECT * FROM t1 WHERE c = 'character';
+------+
| C |
+------+
| |
+------+
1 row in set