OceanBase Database in Oracle mode supports the following character sets:
binarygbkgb18030utf16utf8mb4latin1gb18030_2022
Note
To support seamless migration, OceanBase Database takes utf8 as a synonym of utf8mb4.
The following table describes the collations supported by OceanBase Database in Oracle mode.
| Collation | Character set | Description |
|---|---|---|
| utf8mb4_general_ci | utf8mb4 | A general collation. |
| utf8mb4_bin | utf8mb4 | A binary collation. |
| utf8mb4_unicode_ci | utf8mb4 | A collation that is based on Unicode Collation Algorithm (UCA). |
| binary | binary | A binary collation. |
| gbk_chinese_ci | gbk | A collation for Chinese. |
| gbk_bin | gbk | A binary collation. |
| utf16_general_ci | utf16 | A general collation. |
| utf16_bin | utf16 | A binary collation. |
| utf16_unicode_ci | utf16 | A collation that is based on Unicode Collation Algorithm (UCA). |
| gb18030_chinese_ci | gb18030 | A collation for Chinese. |
| gb18030_bin | gb18030 | A binary collation. |
| latin1_bin | latin1 | A binary collation. |
| gb18030_2022_bin | gb18030_2022 | A binary collation. This is the default collation of the character set in Oracle mode. |
| gb18030_2022_chinese_ci | gb18030_2022 | A Pinyin collation for Chinese. The collation is case-insensitive. |
| gb18030_2022_chinese_cs | gb18030_2022 | A Pinyin collation for Chinese. The collation is case-sensitive. |
| gb18030_2022_radical_ci | gb18030_2022 | A radical stroke collation for Chinese. The collation is case-insensitive. |
| gb18030_2022_radical_cs | gb18030_2022 | A radical stroke collation for Chinese. The collation is case-sensitive. |
| gb18030_2022_stroke_ci | gb18030_2022 | A stroke collation for Chinese. The collation is case-insensitive. |
| gb18030_2022_stroke_cs | gb18030_2022 | A stroke collation for Chinese. The collation is case-sensitive. |
In Oracle mode, you can query the system view NLS_DATABASE_PARAMETERS for the system collation specified by the nls_sort variable, the default database character set for such data types as NCHAR, NVARCHAR2, and NCLOB specified by the nls_nchar_characterset variable, and the default database character set for such data types as CHAR, VARCHAR2, and CLOB specified by the nls_characterset variable.
obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER | VALUE |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT | DD-MON-RR |
| NLS_TIMESTAMP_FORMAT | DD-MON-RR HH.MI.SSXFF AM |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY | AMERICA |
| NLS_SORT | BINARY |
| NLS_COMP | BINARY |
| NLS_CHARACTERSET | AL32UTF8 |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 |
| NLS_DATE_LANGUAGE | AMERICAN |
| NLS_LENGTH_SEMANTICS | BYTE |
| NLS_NCHAR_CONV_EXCP | FALSE |
| NLS_CALENDAR | GREGORIAN |
| NLS_NUMERIC_CHARACTERS | ., |
| NLS_CURRENCY | $ |
| NLS_ISO_CURRENCY | AMERICA |
| NLS_DUAL_CURRENCY | $ |
+-------------------------+------------------------------+
16 rows in set
OceanBase Database allows you to specify a character set other than the default one for communication with the server. For example, to use the gbk character set, execute the following statement after you connect to the server:
obclient> SET NAMES gbk;
Query OK, 0 rows affected
The default collation of each character set in Oracle mode is the corresponding bin collation. However, you can use the NLSSORT() function to specify another collation.
Conversion between gb18030 and gb18030_2022
The current version of OceanBase Database does not support implicit conversion between gb18030 and gb18030_2022. However, you can explicitly convert the character set of a gb18030 string to gb18030_2022 by using the CONVERT() function. Such conversion does not involve Unicode processing, and the original encoding is retained. Here is an example:
In Oracle mode of OceanBase Database, the character set takes effect at the tenant level. For a tenant that uses the gb18030_2022 character set, the CHAR, VARCHAR2, and CLOB fields of all user tables use the gb18030_2022 character set and the gb18030_2022_bin collation. In Oracle mode, string constants are converted to the tenant character set during parsing for character set consistency among SQL statements.