OceanBase Database in Oracle mode supports the following character sets:
binary
utf8mb4
Note
To facilitate seamless migration, OceanBase Database treats
UTF8as a synonym forUTF8MB4in syntax.gbk
utf16
gb18030
latin1
gb2312
gb18030_2022
ascii
tis620
ujis
euckr
eucjpms
cp932
utf16le
sjis
hkscs
hkscs31
dec8
big5
Note
The
big5character set is compatible with thebig5character set in MySQL.cp850
hp8
macroman
swe7
Note
Starting from OceanBase Database V4.3.5 BP1, the following character sets are supported in OceanBase Database V4.3.5: gb2312, ujis, euckr, eucjpms, cp932, cp850, hp8, macroman, and swe7.
Applicability
OceanBase Connector/J does not support the utf8mb4_unicode_ci and utf16_unicode_ci collations.
The following table lists the collations supported by OceanBase Database in Oracle mode.
| Character set | Collation | Description |
|---|---|---|
| utf8mb4_general_ci | utf8mb4 | Uses the general collation rule. |
| utf8mb4_bin | utf8mb4 | Uses the binary collation rule. |
| utf8mb4_unicode_ci | utf8mb4 | Uses the Unicode Collation Algorithm (UCA) collation rule. |
| binary | binary | Uses the binary collation rule. |
| gbk_chinese_ci | gbk | Uses the Chinese collation rule. |
| gbk_bin | gbk | Uses the binary collation rule. |
| utf16_general_ci | utf16 | Uses the general collation rule. |
| utf16_bin | utf16 | Uses the binary collation rule. |
| utf16_unicode_ci | utf16 | Uses the Unicode Collation Algorithm (UCA) collation rule. |
| gb18030_chinese_ci | gb18030 | Uses the Chinese collation rule. |
| gb18030_bin | gb18030 | Uses the binary collation rule. |
| latin1_swedish_ci | latin1 | Uses the Swedish/Finnish collation rule. |
| latin1_german1_ci | latin1 | Uses the collation rule for the Latin1 character set in the German language environment. |
| latin1_danish_ci | latin1 | Uses the collation rule for the Latin1 character set in the Danish language environment. |
| latin1_german2_ci | latin1 | Uses the collation rule for the German language environment, suitable for applications requiring dictionary order comparisons. |
| latin1_general_ci | latin1 | Uses the general collation rule for case-insensitive sorting with accent support, suitable for databases designed for certain European languages. |
| latin1_general_cs | latin1 | Uses the general collation rule for case-sensitive sorting, supporting multiple languages (such as Western European languages). |
| latin1_spanish_ci | latin1 | Uses the collation rule for the Spanish language environment. |
| latin1_bin | latin1 | Uses the binary collation rule for the Latin1 character set. |
| gb18030_2022_bin | gb18030_2022 | Uses the binary collation rule. Default collation for the gb18030_2022 character set in Oracle mode. |
| gb18030_2022_chinese_ci | gb18030_2022 | Uses the pinyin collation rule. Case-insensitive. |
| gb18030_2022_chinese_cs | gb18030_2022 | Uses the pinyin collation rule. Case-sensitive. |
| gb18030_2022_radical_ci | gb18030_2022 | Uses the radical-stroke collation rule. Case-insensitive. |
| gb18030_2022_radical_cs | gb18030_2022 | Uses the radical-stroke collation rule. Case-sensitive. |
| gb18030_2022_stroke_ci | gb18030_2022 | Uses the stroke collation rule. Case-insensitive. |
| gb18030_2022_stroke_cs | gb18030_2022 | Uses the stroke collation rule. Case-sensitive. |
| ascii_bin | ascii | Uses the binary collation rule. It compares and sorts characters as binary data. |
| ascii_general_ci | ascii | Uses the general collation rule for case-insensitive sorting. It ignores case differences and treats uppercase and lowercase letters as the same character. |
| tis620_bin | tis620 | Uses the binary collation rule. |
| tis620_thai_ci | tis620 | Uses the Thai collation rule, case-insensitive. |
| gb2312_chinese_ci | gb2312 | Uses the GB2312 character set with case-insensitive sorting according to the Chinese collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| gb2312_bin | gb2312 | Uses the GB2312 character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| ujis_japanese_ci | ujis | Uses the UJIS character set with case-insensitive sorting according to the Japanese collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| ujis_bin | ujis | Uses the UJIS character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| euckr_korean_ci | euckr | Uses the EUCKR character set with case-insensitive sorting according to the Korean collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| euckr_bin | euckr | Uses the EUCKR character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| eucjpms_japanese_ci | eucjpms | Uses the EUCJPMS character set with case-insensitive sorting according to the Japanese collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| eucjpms_bin | eucjpms | Uses the EUCJPMS character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| cp932_japanese_ci | cp932 | Uses the CP932 character set with case-insensitive sorting according to the Japanese collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| cp932_bin | cp932 | Uses the CP932 character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| cp850_general_ci | cp850 | Uses the CP850 character set with case-insensitive sorting according to the general collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| cp850_bin | cp850 | Uses the CP850 character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| hp8_english_ci | hp8 | Uses the HP8 character set with case-insensitive sorting according to the English collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| hp8_bin | hp8 | Uses the HP8 character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| macroman_general_ci | macroman | Uses the MacRoman character set with case-insensitive sorting according to the general collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| macroman_bin | macroman | Uses the MacRoman character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| swe7_swedish_ci | swe7 | Uses the SWE7 character set with case-insensitive sorting according to the Swedish collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
| swe7_bin | swe7 | Uses the SWE7 character set with case-sensitive sorting according to the binary collation rule.
NoteFor OceanBase Database V4.3.5, this collation is supported starting from V4.3.5 BP1. |
In Oracle mode, you can use the NLS_DATABASE_PARAMETERS view to view the system collation corresponding to the nls_sort variable, the default character set of the NCHAR, NVARCHAR2, and NCLOB data types corresponding to the nls_nchar_characterset variable, and the default character set of the CHAR, VARCHAR2, and CLOB data types corresponding to the nls_characterset variable in OceanBase Database.
obclient> SELECT * FROM NLS_DATABASE_PARAMETERS;
+-------------------------+------------------------------+
| PARAMETER | VALUE |
+-------------------------+------------------------------+
| NLS_DATE_FORMAT | DD-MON-RR |
| NLS_TIMESTAMP_FORMAT | DD-MON-RR HH.MI.SSXFF AM |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR |
| NLS_TERRITORY | AMERICA |
| NLS_SORT | BINARY |
| NLS_COMP | BINARY |
| NLS_CHARACTERSET | AL32UTF8 |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 |
| NLS_DATE_LANGUAGE | AMERICAN |
| NLS_LENGTH_SEMANTICS | BYTE |
| NLS_NCHAR_CONV_EXCP | FALSE |
| NLS_CALENDAR | GREGORIAN |
| NLS_NUMERIC_CHARACTERS | ., |
| NLS_CURRENCY | $ |
| NLS_ISO_CURRENCY | AMERICA |
| NLS_DUAL_CURRENCY | $ |
+-------------------------+------------------------------+
16 rows in set
OceanBase Database allows you to specify a non-default character set for communication with the server. For example, to use the gbk character set, execute the following statement after you connect to the server:
obclient> SET NAMES gbk;
Query OK, 0 rows affected
The default collation method in OceanBase Database in Oracle mode is bin. However, you can use the NLSSORT() function to specify a different collation method.
obclient> CREATE TABLE t(a VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t VALUES ('a'),('A'),('b'),('B'),('x'),('y'),('z'),('m'),('n'),('p'),('ζ');
Query OK, 11 rows affected
Records: 11 Duplicates: 0 Warnings: 0
obclient> SELECT a,NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' ) FROM t ORDER BY NLSSORT(a, 'NLS_SORT=SCHINESE_PINYIN2_M' );
+------+------------------------------------------+
| A | NLSSORT(A,'NLS_SORT=SCHINESE_PINYIN2_M') |
+------+------------------------------------------+
| A | 41 |
| B | 42 |
| a | 61 |
| b | 62 |
| ζ | A6C6 |
| m | 6D |
| n | 6E |
| p | 70 |
| x | 78 |
| y | 79 |
| z | 7A |
+------+------------------------------------------+
11 rows in set
Conversion between GB 18030 and GB 18030-2022
Implicit conversion from gb18030 to gb18030_2022 is not supported in the current OceanBase Database version. You can explicitly convert a gb18030 string to gb18030_2022 by using the CONVERT function. This conversion does not rely on Unicode but preserves the encoded characters, as shown in the example.
obclient> CREATE TABLE t1 (c1 VARCHAR(10));
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES ('Character');
Query OK, 1 row affected
obclient> SELECT RAWTOHEX(c1), RAWTOHEX(CONVERT(c1, 'ZHS32GB18030_2022')) FROM t1;
+--------------+-------------------------------------------+
| RAWTOHEX(C1) | RAWTOHEX(CONVERT(C1,'ZHS32GB18030_2022')) |
+--------------+-------------------------------------------+
| D7D6 | D7D6 |
+--------------+-------------------------------------------+
1 row in set
The gb18030_2022 character set in OceanBase Database in Oracle mode is tenant-level. In a tenant with the gb18030_2022 character set, all user tables use the gb18030_2022 character set for the CHAR, VARCHAR2, and CLOB columns, and all of these columns are sorted using the gb18030_2022_bin collation in binary mode. In Oracle mode, string constants are converted to the tenant's character set during parsing to ensure SQL statement consistency.
/* gb18030_2022 tenant with a column c of character set gb18030_2022 and ordering gb18030_2022_bin */
obclient> CREATE TABLE t1(c VARCHAR(100));
Query OK, 0 rows affected
obclient> INSERT INTO t1 values('character');
Query OK, 1 row affected
/* Set the client character set to gb18030. */
obclient> SET NAMES gb18030;
Query OK, 0 rows affected
obclient> SELECT * FROM t1 WHERE c = 'character';
+------+
| C |
+------+
| character |
+------+
1 row in set