The Oracle mode of OceanBase Database supports the following character sets:
binarygbkgb18030utf16utf8mb4latin1gb18030_2022
Note
To support seamless migration, OceanBase Database takes utf8 as a synonym for utf8mb4.
Applicability
OceanBase Connector/J does not support utf8mb4_unicode_ci or utf16_unicode_ci.
The following table describes the collations supported by the Oracle mode of OceanBase Database.
| Collation | Character set | Description |
|---|---|---|
| utf8mb4_general_ci | utf8mb4 | A general collation. |
| utf8mb4_bin | utf8mb4 | A binary collation. |
| utf8mb4_unicode_ci | utf8mb4 | A collation that is based on Unicode Collation Algorithm (UCA). |
| binary | binary | A binary collation. |
| gbk_chinese_ci | gbk | A collation for Chinese. |
| gbk_bin | gbk | A binary collation. |
| utf16_general_ci | utf16 | A general collation. |
| utf16_bin | utf16 | A binary collation. |
| utf16_unicode_ci | utf16 | A collation that is based on UCA. |
| gb18030_chinese_ci | gb18030 | A collation for Chinese. |
| gb18030_bin | gb18030 | A binary collation. |
| latin1_bin | latin1 | A binary collation. |
| gb18030_2022_bin | gb18030_2022 | A binary collation. This is the default collation for this character set in Oracle mode. |
| gb18030_2022_chinese_ci | gb18030_2022 | A Pinyin collation for Chinese. The collation is case-insensitive. |
| gb18030_2022_chinese_cs | gb18030_2022 | A Pinyin collation for Chinese. The collation is case-sensitive. |
| gb18030_2022_radical_ci | gb18030_2022 | A radical stroke collation for Chinese. The collation is case-insensitive. |
| gb18030_2022_radical_cs | gb18030_2022 | A radical stroke collation for Chinese. The collation is case-sensitive. |
| gb18030_2022_stroke_ci | gb18030_2022 | A stroke collation for Chinese. The collation is case-insensitive. |
| gb18030_2022_stroke_cs | gb18030_2022 | A stroke collation for Chinese. The collation is case-sensitive. |
In Oracle mode, you can query the system view NLS_DATABASE_PARAMETERS for the system collation specified by the nls_sort variable, the default database character set for such data types as NCHAR, NVARCHAR2, and NCLOB specified by the nls_nchar_characterset variable, and the default database character set for such data types as CHAR, VARCHAR2, and CLOB specified by the nls_characterset variable.
obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER | VALUE |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT | DD-MON-RR |
| NLS_TIMESTAMP_FORMAT | DD-MON-RR HH.MI.SSXFF AM |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY | AMERICA |
| NLS_SORT | BINARY |
| NLS_COMP | BINARY |
| NLS_CHARACTERSET | AL32UTF8 |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 |
| NLS_DATE_LANGUAGE | AMERICAN |
| NLS_LENGTH_SEMANTICS | BYTE |
| NLS_NCHAR_CONV_EXCP | FALSE |
| NLS_CALENDAR | GREGORIAN |
| NLS_NUMERIC_CHARACTERS | ., |
| NLS_CURRENCY | $ |
| NLS_ISO_CURRENCY | AMERICA |
| NLS_DUAL_CURRENCY | $ |
+-------------------------+------------------------------+
16 rows in set
OceanBase Database allows you to specify a character set other than the default one for communication with the server. For example, to use the gbk character set, execute the following statement after you connect to the server:
obclient> SET NAMES gbk;
Query OK, 0 rows affected
The default collation of each character set in Oracle mode is the corresponding bin collation. However, you can use the NLSSORT() function to specify another collation.
obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t VALUES ('h'),('H'),('i'),('I'),('j'),('J'),('k'),('K'),('l'),('L'),('m');
Query OK, 11 rows affected
Records: 11 Duplicates: 0 Warnings: 0
obclient> SELECT a,NLSSORT(a, 'NLS_SORT=BINARY_CI') FROM t ORDER BY NLSSORT(a, 'NLS_SORT=BINARY_CI');
+---+-----------------------------------+
| A | NLSSORT(A, 'NLS_SORT=BINARY_CI') |
+---+-----------------------------------+
| H | 48 |
| h | 68 |
| I | 49 |
| i | 69 |
| J | 4A |
| j | 6A |
| K | 4B |
| k | 6B |
| L | 4C |
| l | 6C |
| m | 6D |
+---+-----------------------------------+
11 rows in set
Conversion between gb18030 and gb18030_2022
The current version of OceanBase Database does not support implicit conversion between gb18030 and gb18030_2022. However, you can explicitly convert the character set of a gb18030 string to gb18030_2022 by using the CONVERT() function. Such conversion does not involve Unicode encoding and decoding. That is, the original code is retained.
In the Oracle mode of OceanBase Database, the character set is applied at the tenant level. For a tenant using the gb18030_2022 character set, the CHAR, VARCHAR2, and CLOB fields of all user tables will have the gb18030_2022 character set and the gb18030_2022_bin collation. In the Oracle mode, string constants are converted to the tenant character set during parsing to ensure consistent character set handling in SQL statements.
/* In the tenant whose character set is gb18030_2022, the character set of column c is gb18030_2022 and the collation is gb18030_2022_bin. */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES ('Word');
Query OK, 1 row affected
/* Set the client character set to gb18030. */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected
/* The SQL statement will convert the string 'Word' to gb18030_2022 during parsing. The SQL will execute without errors. */
obclient > SELECT * FROM t1 WHERE c = 'Word';
+------+
| C |
+------+
| Word |
+------+
1 row in set