The Oracle mode of OceanBase Database supports the following character sets:
gbkgb18030utf16utf8mb4latin1gb18030_2022
Note
To support seamless migration, OceanBase Database treats UTF8 as a synonym of UTF8MB4.
Applicability
OceanBase Connector/J does not support utf8mb4_unicode_ci and utf16_unicode_ci.
The following table describes the collations supported by the Oracle mode of OceanBase Database.
| Collation | Character set | Description |
|---|---|---|
| utf8mb4_general_ci | utf8mb4 | A general collation. |
| utf8mb4_bin | utf8mb4 | A binary collation. |
| utf8mb4_unicode_ci | utf8mb4 | A collation that is based on Unicode Collation Algorithm (UCA). |
| binary | binary | A binary collation. |
| gbk_chinese_ci | gbk | A collation for Chinese. |
| gbk_bin | gbk | A binary collation. |
| utf16_general_ci | utf16 | A general collation. |
| utf16_bin | utf16 | A binary collation. |
| utf16_unicode_ci | utf16 | A collation that is based on UCA. |
| gb18030_chinese_ci | gb18030 | A collation for Chinese. |
| gb18030_bin | gb18030 | A binary collation. |
| latin1_bin | latin1 | latin1 A binary collation. |
| gb18030_2022_bin | gb18030_2022 | A binary collation. This is the default collation for this character set in Oracle mode. |
| gb18030_2022_chinese_ci | gb18030_2022 | A collation that sorts data by Pinyin. The collation is case-insensitive. |
| gb18030_2022_chinese_cs | gb18030_2022 | A collation that sorts data by Pinyin. The collation is case-sensitive. |
| gb18030_2022_radical_ci | gb18030_2022 | A collation that sorts data by radicals. The collation is case-insensitive. |
| gb18030_2022_radical_cs | gb18030_2022 | A collation that sorts data by radicals. The collation is case-sensitive. |
| gb18030_2022_stroke_ci | gb18030_2022 | A collation that sorts data by strokes. The collation is case-insensitive. |
| gb18030_2022_stroke_cs | gb18030_2022 | A collation that sorts data by strokes. The collation is case-sensitive. |
In Oracle mode of OceanBase Database, you can query the system view NLS_DATABASE_PARAMETERS for the system collation specified by the nls_sort variable, the default database character set for such data types as NCHAR, NVARCHAR2, and NCLOB specified by the nls_nchar_characterset variable, and the default database character set for such data types as CHAR, VARCHAR2, and CLOB specified by the nls_characterset variable.
obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER | VALUE |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT | DD-MON-RR |
| NLS_TIMESTAMP_FORMAT | DD-MON-RR HH.MI.SSXFF AM |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY | AMERICA |
| NLS_SORT | BINARY |
| NLS_COMP | BINARY |
| NLS_CHARACTERSET | AL32UTF8 |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 |
| NLS_DATE_LANGUAGE | AMERICAN |
| NLS_LENGTH_SEMANTICS | BYTE |
| NLS_NCHAR_CONV_EXCP | FALSE |
| NLS_CALENDAR | GREGORIAN |
| NLS_NUMERIC_CHARACTERS | ., |
| NLS_CURRENCY | $ |
| NLS_ISO_CURRENCY | AMERICA |
| NLS_DUAL_CURRENCY | $ |
+-------------------------+------------------------------+
16 rows in set
OceanBase Database allows you to specify a character set other than the default one for communication with the server. For example, to use the gbk character set, execute the following statement after you connect to the server:
obclient> SET NAMES gbk;
Query OK, 0 rows affected
The default collation of each character set in Oracle mode is the corresponding bin collation. However, you can use the NLSSORT() function to specify another collation.
obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t VALUES ('h'),('H'),('i'),('I'),('j'),('J'),('k'),('K'),('l'),('L'),('m');
Query OK, 11 rows affected
Records: 11 Duplicates: 0 Warnings: 0
obclient> SELECT a,NLSSORT(a, 'NLS_SORT=BINARY_CI') FROM t ORDER BY NLSSORT(a, 'NLS_SORT=BINARY_CI');
+---+-----------------------------------+
| A | NLSSORT(A,'NLS_SORT=BINARY_CI') |
+---+-----------------------------------+
| H | 48 |
| h | 68 |
| I | 49 |
| i | 69 |
| J | 4A |
| j | 6A |
| K | 4B |
| k | 6B |
| L | 4C |
| l | 6C |
| m | 6D |
+---+-----------------------------------+
11 rows in set
Conversion between gb18030 and gb18030_2022
The current version of OceanBase Database does not support implicit conversion between gb18030 and gb18030_2022. However, you can explicitly convert the character set of a gb18030 string to gb18030_2022 by using the CONVERT() function. Such conversion does not involve Unicode encoding and decoding. Therefore, the original code is retained. Here is an example:
/* In the tenant whose character set is gb18030_2022, the character set of column c is gb18030_2022 and the collation of the column is gb18030_2022_bin. */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected
obclient> INSERT INTO t1 values('Word');
Query OK, 1 row affected
/* Set the client character set to gb18030. */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected
/* During SQL parsing, the string 'Word' is converted to gb18030_2022. No error is reported when the SQL statement is executed. */
obclient > SELECT * FROM t1 WHERE c = 'Word';
+------+
| C |
+------+
| Word |
+------+
1 row in set
In Oracle mode of OceanBase Database, the character set is applied at the tenant level. For a tenant using the gb18030_2022 character set, the CHAR, VARCHAR2, and CLOB fields of all user tables will have the gb18030_2022 character set and the gb18030_2022_bin collation. In Oracle mode, string constants are converted into the tenant character set during parsing to ensure consistent character set handling in SQL statements.
/* In the tenant whose character set is gb18030_2022, the character set of column c is gb18030_2022 and the collation of the column is gb18030_2022_bin. */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected
obclient> INSERT INTO t1 values('Word');
Query OK, 1 row affected
/* Set the client character set to gb18030. */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected
/* During SQL parsing, the string 'Word' is converted to gb18030_2022. No error is reported when the SQL statement is executed. */
obclient > SELECT * FROM t1 WHERE c = 'Word';
+------+
| C |
+------+
| Word |
+------+
1 row in set