OceanBase Database in Oracle mode supports the following character sets:
binary
utf8mb4
Note
To facilitate seamless migration, OceanBase Database treats
UTF8as a synonym forUTF8MB4in syntax.gbk
utf16
gb18030
latin1
gb2312
gb18030_2022
ascii
tis620
eucjpms
utf16le
hkscs
hkscs31
dec8
macroman
Note
Starting from OceanBase Database V4.3.5 BP1, the following character sets are supported: gb2312, eucjpms, and macroman.
Applicability
OceanBase Connector/J does not support utf8mb4_unicode_ci and utf16_unicode_ci.
OceanBase Database in Oracle mode supports the following collations. The following table lists the supported collations.
| Character set | Collation | Description |
|---|---|---|
| utf8mb4_general_ci | utf8mb4 | Uses the general collation. |
| utf8mb4_bin | utf8mb4 | Uses the binary collation. |
| utf8mb4_unicode_ci | utf8mb4 | Uses the Unicode Collation Algorithm (UCA) collation. |
| binary | binary | Uses the binary collation. |
| gbk_chinese_ci | gbk | Uses the Chinese collation. |
| gbk_bin | gbk | Uses the binary collation. |
| utf16_general_ci | utf16 | Uses the general collation. |
| utf16_bin | utf16 | Uses the binary collation. |
| utf16_unicode_ci | utf16 | Uses the Unicode Collation Algorithm (UCA) collation. |
| gb18030_chinese_ci | gb18030 | Uses the Chinese collation. |
| gb18030_bin | gb18030 | Uses the binary collation. |
| latin1_swedish_ci | latin1 | Uses the Swedish/Finnish collation. |
| latin1_german1_ci | latin1 | Uses the Latin1 character set for the German language environment. |
| latin1_danish_ci | latin1 | Uses the Latin1 character set for the Danish language environment. |
| latin1_german2_ci | latin1 | Uses the Latin1 character set for the German language environment, suitable for applications requiring dictionary order comparisons. |
| latin1_general_ci | latin1 | Uses the general collation for case-insensitive sorting with accent support, suitable for databases of certain European languages. |
| latin1_general_cs | latin1 | Uses the general collation for case-sensitive sorting, supporting multiple languages (such as Western European languages). |
| latin1_spanish_ci | latin1 | Uses the Spanish collation for the Spanish language environment. |
| latin1_bin | latin1 | Uses the binary collation for the Latin1 character set. |
| gb18030_2022_bin | gb18030_2022 | Uses the binary collation. Default collation for the gb18030_2022 character set in Oracle mode. |
| gb18030_2022_chinese_ci | gb18030_2022 | Uses the pinyin collation. Case-insensitive. |
| gb18030_2022_chinese_cs | gb18030_2022 | Uses the pinyin collation. Case-sensitive. |
| gb18030_2022_radical_ci | gb18030_2022 | Uses the radical-stroke collation. Case-insensitive. |
| gb18030_2022_radical_cs | gb18030_2022 | Uses the radical-stroke collation. Case-sensitive. |
| gb18030_2022_stroke_ci | gb18030_2022 | Uses the stroke collation. Case-insensitive. |
| gb18030_2022_stroke_cs | gb18030_2022 | Uses the stroke collation. Case-sensitive. |
| ascii_bin | ascii | Uses the binary collation. Compares and sorts characters as binary data. |
| ascii_general_ci | ascii | Uses the general collation for case-insensitive sorting. It ignores case differences and treats uppercase and lowercase letters as the same. |
| tis620_bin | tis620 | Uses the binary collation. |
| tis620_thai_ci | tis620 | Uses the Thai collation, case-insensitive. |
| gb2312_chinese_ci | gb2312 | Uses the GB2312 character set with the Chinese collation for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| gb2312_bin | gb2312 | Uses the GB2312 character set with the binary collation for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| eucjpms_japanese_ci | eucjpms | Uses the EUCJPMS character set with the Japanese collation for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| eucjpms_bin | eucjpms | Uses the EUCJPMS character set with the binary collation for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| macroman_general_ci | macroman | Uses the MacRoman character set with the general collation for case-insensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| macroman_bin | macroman | Uses the MacRoman character set with the binary collation for case-sensitive sorting.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
In Oracle mode of OceanBase Database, you can view the system collation corresponding to the nls_sort variable by querying the system view NLS_DATABASE_PARAMETERS. The nls_nchar_characterset variable corresponds to the default character set for NCHAR, NVARCHAR2, and NCLOB data types, while the nls_characterset variable corresponds to the default character set for CHAR, VARCHAR2, and CLOB data types.
obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER | VALUE |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT | DD-MON-RR |
| NLS_TIMESTAMP_FORMAT | DD-MON-RR HH.MI.SSXFF AM |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY | AMERICA |
| NLS_SORT | BINARY |
| NLS_COMP | BINARY |
| NLS_CHARACTERSET | AL32UTF8 |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 |
| NLS_DATE_LANGUAGE | AMERICAN |
| NLS_LENGTH_SEMANTICS | BYTE |
| NLS_NCHAR_CONV_EXCP | FALSE |
| NLS_CALENDAR | GREGORIAN |
| NLS_NUMERIC_CHARACTERS | ., |
| NLS_CURRENCY | $ |
| NLS_ISO_CURRENCY | AMERICA |
| NLS_DUAL_CURRENCY | $ |
+-------------------------+------------------------------+
16 rows in set
OceanBase Database allows you to specify a character set other than the default character set to communicate with the server. For example, to use the gbk character set, after you connect to the server, execute the following statement:
obclient> SET NAMES gbk;
Query OK, 0 rows affected
In OceanBase Database in Oracle mode, the default sort method is bin, but you can specify other sort methods using the NLSSORT() function.
obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t VALUES ('a'),('A'),('b'),('B'),('闖'),('闯'),('门'),('醜'),('锌'),('陆'),('ζ');
Query OK, 11 rows affected
Records: 11 Duplicates: 0 Warnings: 0
obclient> SELECT a,NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' ) FROM t ORDER BY NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' );
+------+------------------------------------------+
| A | NLSSORT(A,'NLS_SORT=SCHINESE_PINYIN2_M') |
+------+------------------------------------------+
| A | 41 |
| B | 42 |
| a | 61 |
| b | 62 |
| ζ | A6C6 |
| FFA01016 | FFA01016 |
| FFA01156 | FFA01156 |
| Error | FFA0115E |
| | FFA04E35 |
| Column | FFA052DB |
| Zinc | FFA08A0F |
+------+------------------------------------------+
11 rows in set
Convert data in the GB18030 character set to the GB18030-2022 character set
The current version of OceanBase Database does not support implicit conversion from the gb18030 or gb18030_2022 character sets. However, a character set of a gb18030 string can be explicitly converted to the gb18030_2022 character set with the CONVERT function, which uses the method of code point preservation. The following example shows how to do this.
obclient> CREATE TABLE t1 (c1 VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES ('Characters');
Query OK, 1 row affected
obclient> SELECT RAWTOHEX(c1), RAWTOHEX(CONVERT(c1, 'ZHS32GB18030_2022')) FROM t1;
+--------------+-------------------------------------------+
| RAWTOHEX(C1) | RAWTOHEX(CONVERT(C1,'ZHS32GB18030_2022')) |
+--------------+-------------------------------------------+
| D7D6 | D7D6 |
+--------------+-------------------------------------------+
1 row in set
In Oracle mode, the character set of OceanBase Database is at the tenant level. For a tenant that uses the gb18030_2022 character set, all CHAR, VARCHAR2, and CLOB columns in user tables are of the gb18032_2022 character set and follow the gb18030_2022_bin binary sort order. In Oracle mode, character constants are converted to the tenant character set during parsing to ensure SQL character set consistency.
/* gb18030_2022 tenant, column c is of the gb18030_2022 character set and uses the gb18030_2022_bin collation */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected
obclient> INSERT INTO t1 values('word');
Query OK, 1 row affected
/* Setting the client character set to gb18030 */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected
/* The `DELETE_SCHEMA_STATS` procedure xxx */
obclient > SELECT * FROM t1 WHERE c = 'Character';
+------+
| C |
+------+
| Word |
+------+
1 row in set
