The Oracle mode of OceanBase Database supports the following character sets:
binary
utf8mb4
Note
To support seamless migration, OceanBase Database treats
UTF8as a synonym ofUTF8MB4.gbk
utf16
gb18030
latin1
gb2312
gb18030_2022
ascii
tis620
ujis
euckr
eucjpms
cp932
utf16le
sjis
hkscs
hkscs31
dec8
big5
Note
The
big5character set is compatible with MySQL'sbig5character set.cp850
hp8
macroman
swe7
Note
Starting from OceanBase Database V4.3.5 BP1, the following character sets are supported: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.
Applicability
OceanBase Connector/J does not support utf8mb4_unicode_ci and utf16_unicode_ci.
The following table describes the collations supported by the Oracle mode of OceanBase Database.
| Collation | Character set | Description |
|---|---|---|
| utf8mb4_general_ci | utf8mb4 | Uses general collation. |
| utf8mb4_bin | utf8mb4 | Uses binary collation. |
| utf8mb4_unicode_ci | utf8mb4 | Uses collation based on the Unicode Collation Algorithm (UCA). |
| binary | binary | Uses binary collation. |
| gbk_chinese_ci | gbk | Uses Chinese language collation. |
| gbk_bin | gbk | Uses binary collation. |
| utf16_general_ci | utf16 | Uses general collation. |
| utf16_bin | utf16 | Uses binary collation. |
| utf16_unicode_ci | utf16 | Uses collation based on the UCA. |
| gb18030_chinese_ci | gb18030 | Uses Chinese language collation. |
| gb18030_bin | gb18030 | Uses binary collation. |
| latin1_swedish_ci | latin1 | Uses Swedish/Finnish collation. |
| latin1_german1_ci | latin1 | Uses collation for the German language environment in the latin1 character set. |
| latin1_danish_ci | latin1 | Uses collation for the Danish language environment in the latin1 character set. |
| latin1_german2_ci | latin1 | Used for German environments, suitable for applications needing dictionary order character comparison. |
| latin1_general_ci | latin1 | Used for case-insensitive scenarios supporting diacritics, such as database designs for some European languages. |
| latin1_general_cs | latin1 | Used for case-sensitive general collation, supporting multiple languages (for example, Western European languages). |
| latin1_spanish_ci | latin1 | Used for collation in the Spanish language environment. |
| latin1_bin | latin1 | Uses binary collation for latin1. |
| gb18030_2022_bin | gb18030_2022 | Uses binary collation, which is the default character sequence in Oracle mode. |
| gb18030_2022_chinese_ci | gb18030_2022 | Uses pinyin collation, which is case-insensitive. |
| gb18030_2022_chinese_cs | gb18030_2022 | Uses pinyin collation, which is case-sensitive. |
| gb18030_2022_radical_ci | gb18030_2022 | Uses radical-stroke collation, which is case-insensitive. |
| gb18030_2022_radical_cs | gb18030_2022 | Uses radical-stroke collation, which is case-sensitive. |
| gb18030_2022_stroke_ci | gb18030_2022 | Uses stroke collation, which is case-insensitive. |
| gb18030_2022_stroke_cs | gb18030_2022 | Uses stroke collation, which is case-sensitive. |
| ascii_bin | ascii | Uses binary collation, comparing characters as binary data. |
| ascii_general_ci | ascii | Uses case-insensitive alphabetical collation, treating uppercase and lowercase letters as the same. |
| tis620_bin | tis620 | Uses binary collation. |
| tis620_thai_ci | tis620 | Uses Thai collation, which is case-insensitive. |
| gb2312_chinese_ci | gb2312 | Uses GB2312 character set, performs case-insensitive sorting according to Chinese collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| gb2312_bin | gb2312 | Uses GB2312 character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| ujis_japanese_ci | ujis | Uses UJIS character set, performs case-insensitive sorting according to Japanese collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| ujis_bin | ujis | Uses UJIS character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| euckr_korean_ci | euckr | Uses EUCKR character set, performs case-insensitive sorting according to Korean collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| euckr_bin | euckr | Uses EUCKR character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| eucjpms_japanese_ci | eucjpms | Uses EUCJPMS character set, performs case-insensitive sorting according to Japanese collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| eucjpms_bin | eucjpms | Uses EUCJPMS character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| cp932_japanese_ci | cp932 | Uses CP932 character set, performs case-insensitive sorting according to Japanese collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| cp932_bin | cp932 | Uses CP932 character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| cp850_general_ci | cp850 | Uses CP850 character set, performs case-insensitive sorting according to general collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| cp850_bin | cp850 | Uses CP850 character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| hp8_english_ci | hp8 | Uses HP8 character set, performs case-insensitive sorting according to English collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| hp8_bin | hp8 | Uses HP8 character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| macroman_general_ci | macroman | Uses MacRoman character set, performs case-insensitive sorting according to general collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| macroman_bin | macroman | Uses MacRoman character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| swe7_swedish_ci | swe7 | Uses SWE7 character set, performs case-insensitive sorting according to Swedish collation.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
| swe7_bin | swe7 | Uses SWE7 character set, performs case-sensitive sorting in binary order.
NoteFor OceanBase Database V4.3.5, this collation is supported since V4.3.5 BP1. |
In Oracle mode of OceanBase Database, you can query the system view NLS_DATABASE_PARAMETERS for the system collation specified by the nls_sort variable, the default database character set for such data types as NCHAR, NVARCHAR2, and NCLOB specified by the nls_nchar_characterset variable, and the default database character set for such data types as CHAR, VARCHAR2, and CLOB specified by the nls_characterset variable.
obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER | VALUE |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT | DD-MON-RR |
| NLS_TIMESTAMP_FORMAT | DD-MON-RR HH.MI.SSXFF AM |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY | AMERICA |
| NLS_SORT | BINARY |
| NLS_COMP | BINARY |
| NLS_CHARACTERSET | AL32UTF8 |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 |
| NLS_DATE_LANGUAGE | AMERICAN |
| NLS_LENGTH_SEMANTICS | BYTE |
| NLS_NCHAR_CONV_EXCP | FALSE |
| NLS_CALENDAR | GREGORIAN |
| NLS_NUMERIC_CHARACTERS | ., |
| NLS_CURRENCY | $ |
| NLS_ISO_CURRENCY | AMERICA |
| NLS_DUAL_CURRENCY | $ |
+-------------------------+------------------------------+
16 rows in set
OceanBase Database allows you to specify a character set other than the default one for communication with the server. For example, to use the gbk character set, execute the following statement after you connect to the server:
obclient> SET NAMES gbk;
Query OK, 0 rows affected
The default collation of each character set in Oracle mode is the corresponding bin collation. However, you can use the NLSSORT() function to specify another collation.
obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t VALUES ('h'),('H'),('i'),('I'),('j'),('J'),('k'),('K'),('l'),('L'),('m');
Query OK, 11 rows affected
Records: 11 Duplicates: 0 Warnings: 0
obclient> SELECT a,NLSSORT(a, 'NLS_SORT=BINARY_CI') FROM t ORDER BY NLSSORT(a, 'NLS_SORT=BINARY_CI');
+---+-----------------------------------+
| A | NLSSORT(A,'NLS_SORT=BINARY_CI') |
+---+-----------------------------------+
| H | 48 |
| h | 68 |
| I | 49 |
| i | 69 |
| J | 4A |
| j | 6A |
| K | 4B |
| k | 6B |
| L | 4C |
| l | 6C |
| m | 6D |
+---+-----------------------------------+
11 rows in set
Conversion between gb18030 and gb18030_2022
OceanBase Database of earlier versions do not support implicit conversion between gb18030 and gb18030_2022. However, you can use the CONVERT function to explicitly convert a gb18030 string to gb18030_2022. This conversion retains the encoded values and does not go through Unicode. Here is an example.
obclient> CREATE TABLE t1 (c1 VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t1 values('Word');
Query OK, 1 row affected
obclient> SELECT RAWTOHEX(c1), RAWTOHEX(CONVERT(c1, 'ZHS32GB18030_2022')) FROM t1;
+------+
| C |
+------+
| Word |
+------+
1 row in set
In Oracle mode of OceanBase Database, the character set is tenant-level. In a tenant that uses the gb18030_2022 character set, the CHAR, VARCHAR2, and CLOB columns of all user tables are of the gb18030_2022 character set and sorted in the gb18030_2022_bin order. In the Oracle mode, after a string constant is parsed, it is converted into the tenant character set to unify the character set of SQL.
/* In the tenant whose character set is gb18030_2022, the character set of column c is gb18030_2022 and the collation of the column is gb18030_2022_bin. */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected
obclient> INSERT INTO t1 values('Word');
Query OK, 1 row affected
/* Set the client character set to gb18030. */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected
/* During SQL parsing, the string 'Word' is converted to gb18030_2022. No error is reported when the SQL statement is executed. */
obclient > SELECT * FROM t1 WHERE c = 'Word';
+------+
| C |
+------+
| Word |
+------+
1 row in set