The default character set of OceanBase Database is utf8mb4_general_ci.
OceanBase Database supports the character sets listed in the following table.
| Character set | Collation | Description |
|---|---|---|
| utf8mb4_general_ci | utf8mb4 | Uses the general collation rule. |
| utf8mb4_bin | utf8mb4 | Uses the binary collation rule. |
| utf8mb4_unicode_ci | utf8mb4 | Uses the Unicode Collation Algorithm (UCA) collation rule. |
| utf8mb4_unicode_520_ci | utf8mb4 | Uses the collation rule of Unicode 5.2.0. It follows the Unicode code point collation and ignores the case sensitivity. |
| utf8mb4_croatian_ci | utf8mb4 | Uses the Croatian collation rule. utf8mb4_croatian_ci is compatible with utf8_croatian_ci. |
| utf8mb4_czech_ci | utf8mb4 | Uses the Czech collation rule. |
| utf8mb4_0900_ai_ci | utf8mb4 | Uses the collation rule of Unicode 9.0.0. It ignores the case sensitivity and treats uppercase and lowercase letters as the same character. |
| binary | binary | Uses the binary collation rule. |
| gbk_chinese_ci | gbk | Uses the Chinese collation rule. |
| gbk_bin | gbk | Uses the binary collation rule. |
| utf16_general_ci | utf16 | Uses the general collation rule. |
| utf16_bin | utf16 | Uses the binary collation rule. |
| utf16_unicode_ci | utf16 | Uses the Unicode Collation Algorithm (UCA) collation rule. |
| utf8mb4_german2_ci | utf16le | Uses the German collation rule. |
| utf8mb4_croatian_ci | utf16le | Uses the Croatian collation rule. |
| gb18030_chinese_ci | gb18030 | Uses the Chinese collation rule. |
| gb18030_bin | gb18030 | Uses the binary collation rule. |
| latin1_swedish_ci | latin1 | Uses the Swedish/Finnish collation rule. |
| latin1_german1_ci | latin1 | The collation rule for the Latin1 character set in the German language environment.
NoteFor OceanBase Database V4.2.5, this collation is supported starting from V4.2.5 BP1. |
| latin1_danish_ci | latin1 | The collation rule for the Latin1 character set in the Danish language environment.
NoteFor OceanBase Database V4.2.5, this collation is supported starting from V4.2.5 BP1. |
| latin1_german2_ci | latin1 | The collation rule for the German language environment, suitable for applications that require character comparison based on dictionary order.
NoteFor OceanBase Database V4.2.5, this collation is supported starting from V4.2.5 BP1. |
| latin1_general_ci | latin1 | The collation rule for scenarios that require case-insensitive comparison and support for accents, such as in the design of certain European language databases.
NoteFor OceanBase Database V4.2.5, this collation is supported starting from V4.2.5 BP1. |
| latin1_general_cs | latin1 | The case-sensitive general collation rule, supporting multiple languages (such as Western European languages).
NoteFor OceanBase Database V4.2.5, this collation is supported starting from V4.2.5 BP1. |
| latin1_spanish_ci | latin1 | The collation rule for the Spanish language environment.
NoteFor OceanBase Database V4.2.5, this collation is supported starting from V4.2.5 BP1. |
| latin1_bin | latin1 | The Latin1 character set uses the binary collation rule. |
| gb18030_2022_bin | gb18030_2022 | Uses the binary collation rule. |
| gb18030_2022_chinese_ci | gb18030_2022 | Uses the pinyin collation rule. Case-insensitive. The default collation for this character set in MySQL mode. |
| gb18030_2022_chinese_cs | gb18030_2022 | Uses the pinyin collation rule. Case-sensitive. |
| gb18030_2022_radical_ci | gb18030_2022 | Uses the radical-stroke collation rule. Case-insensitive. |
| gb18030_2022_radical_cs | gb18030_2022 | Uses the radical-stroke collation rule. Case-sensitive. |
| gb18030_2022_stroke_ci | gb18030_2022 | Uses the stroke collation rule. Case-insensitive. |
| gb18030_2022_stroke_cs | gb18030_2022 | Uses the stroke collation rule. Case-sensitive. |
| ascii_bin | ascii | The collation rule is based on binary bit comparison. It compares and sorts characters as binary data. |
| ascii_general_ci | ascii | The collation rule is based on case-insensitive letter sorting. It ignores the case sensitivity and treats uppercase and lowercase letters as the same character. |
| tis620_bin | tis620 | Uses the binary collation rule. |
| tis620_thai_ci | tis620 | Uses the Thai collation rule, case-insensitive. |
| sjis_japanese_ci | sjis | Uses the Japanese collation rule. |
| dec8_swedish_ci | dec8 | Uses the Swedish collation rule. |
Note
- For any Unicode character set, operations performed using the
xxx_general_cicollation are faster than those performed using thexxx_unicode_cicollation. - Character set collations cannot be modified.
By default, the SHOW COLLATION statement displays all available collations.
obclient [(none)]> SHOW COLLATION;
The result set is as follows:
+----------------------------+--------------+------+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------------+--------------+------+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| binary | binary | 63 | Yes | Yes | 1 |
| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 |
| gbk_bin | gbk | 87 | | Yes | 1 |
| utf16_general_ci | utf16 | 54 | Yes | Yes | 1 |
| utf16_bin | utf16 | 55 | | Yes | 1 |
| gb18030_chinese_ci | gb18030 | 248 | Yes | Yes | 2 |
| gb18030_bin | gb18030 | 249 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
| latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| gb18030_2022_bin | gb18030_2022 | 216 | | Yes | 1 |
| gb18030_2022_chinese_ci | gb18030_2022 | 217 | Yes | Yes | 1 |
| gb18030_2022_chinese_cs | gb18030_2022 | 218 | | Yes | 1 |
| gb18030_2022_radical_ci | gb18030_2022 | 219 | | Yes | 1 |
| gb18030_2022_radical_cs | gb18030_2022 | 220 | | Yes | 1 |
| gb18030_2022_stroke_ci | gb18030_2022 | 221 | | Yes | 1 |
| gb18030_2022_stroke_cs | gb18030_2022 | 222 | | Yes | 1 |
| ascii_general_ci | ascii | 11 | Yes | Yes | 1 |
| ascii_bin | ascii | 65 | | Yes | 1 |
| tis620_thai_ci | tis620 | 18 | Yes | Yes | 1 |
| tis620_bin | tis620 | 89 | | Yes | 1 |
| utf16le_general_ci | utf16le | 56 | Yes | Yes | 1 |
| utf16le_bin | utf16le | 62 | | Yes | 1 |
| sjis_japanese_ci | sjis | 13 | Yes | Yes | 1 |
| sjis_bin | sjis | 88 | | Yes | 1 |
| big5_chinese_ci | big5 | 1 | Yes | Yes | 1 |
| big5_bin | big5 | 84 | | Yes | 1 |
| hkscs_bin | hkscs | 152 | Yes | Yes | 1 |
| hkscs31_bin | hkscs31 | 153 | Yes | Yes | 1 |
| utf16_unicode_ci | utf16 | 101 | | Yes | 8 |
| utf16_icelandic_ci | utf16 | 102 | | Yes | 8 |
| utf16_latvian_ci | utf16 | 103 | | Yes | 8 |
| utf16_romanian_ci | utf16 | 104 | | Yes | 8 |
| utf16_slovenian_ci | utf16 | 105 | | Yes | 8 |
| utf16_polish_ci | utf16 | 106 | | Yes | 8 |
| utf16_estonian_ci | utf16 | 107 | | Yes | 8 |
| utf16_spanish_ci | utf16 | 108 | | Yes | 8 |
| utf16_swedish_ci | utf16 | 109 | | Yes | 8 |
| utf16_turkish_ci | utf16 | 110 | | Yes | 8 |
| utf16_czech_ci | utf16 | 111 | | Yes | 8 |
| utf16_danish_ci | utf16 | 112 | | Yes | 8 |
| utf16_lithuanian_ci | utf16 | 113 | | Yes | 8 |
| utf16_slovak_ci | utf16 | 114 | | Yes | 8 |
| utf16_spanish2_ci | utf16 | 115 | | Yes | 8 |
| utf16_roman_ci | utf16 | 116 | | Yes | 8 |
| utf16_persian_ci | utf16 | 117 | | Yes | 8 |
| utf16_esperanto_ci | utf16 | 118 | | Yes | 8 |
| utf16_hungarian_ci | utf16 | 119 | | Yes | 8 |
| utf16_sinhala_ci | utf16 | 120 | | Yes | 8 |
| utf16_german2_ci | utf16 | 121 | | Yes | 8 |
| utf16_croatian_ci | utf16 | 122 | | Yes | 8 |
| utf16_unicode_520_ci | utf16 | 123 | | Yes | 8 |
| utf16_vietnamese_ci | utf16 | 124 | | Yes | 8 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
| dec8_swedish_ci | dec8 | 3 | Yes | Yes | 8 |
| dec8_bin | dec8 | 69 | | Yes | 8 |
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 0 |
| utf8mb4_de_pb_0900_ai_ci | utf8mb4 | 256 | | Yes | 0 |
| utf8mb4_is_0900_ai_ci | utf8mb4 | 257 | | Yes | 0 |
| utf8mb4_lv_0900_ai_ci | utf8mb4 | 258 | | Yes | 0 |
| utf8mb4_ro_0900_ai_ci | utf8mb4 | 259 | | Yes | 0 |
| utf8mb4_sl_0900_ai_ci | utf8mb4 | 260 | | Yes | 0 |
| utf8mb4_pl_0900_ai_ci | utf8mb4 | 261 | | Yes | 0 |
| utf8mb4_et_0900_ai_ci | utf8mb4 | 262 | | Yes | 0 |
| utf8mb4_es_0900_ai_ci | utf8mb4 | 263 | | Yes | 0 |
| utf8mb4_sv_0900_ai_ci | utf8mb4 | 264 | | Yes | 0 |
| utf8mb4_tr_0900_ai_ci | utf8mb4 | 265 | | Yes | 0 |
| utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | | Yes | 0 |
| utf8mb4_da_0900_ai_ci | utf8mb4 | 267 | | Yes | 0 |
| utf8mb4_lt_0900_ai_ci | utf8mb4 | 268 | | Yes | 0 |
| utf8mb4_sk_0900_ai_ci | utf8mb4 | 269 | | Yes | 0 |
| utf8mb4_es_trad_0900_ai_ci | utf8mb4 | 270 | | Yes | 0 |
| utf8mb4_la_0900_ai_ci | utf8mb4 | 271 | | Yes | 0 |
| utf8mb4_eo_0900_ai_ci | utf8mb4 | 273 | | Yes | 0 |
| utf8mb4_hu_0900_ai_ci | utf8mb4 | 274 | | Yes | 0 |
| utf8mb4_hr_0900_ai_ci | utf8mb4 | 275 | | Yes | 0 |
| utf8mb4_vi_0900_ai_ci | utf8mb4 | 277 | | Yes | 0 |
| utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 |
| utf8mb4_de_pb_0900_as_cs | utf8mb4 | 279 | | Yes | 0 |
| utf8mb4_is_0900_as_cs | utf8mb4 | 280 | | Yes | 0 |
| utf8mb4_lv_0900_as_cs | utf8mb4 | 281 | | Yes | 0 |
| utf8mb4_ro_0900_as_cs | utf8mb4 | 282 | | Yes | 0 |
| utf8mb4_sl_0900_as_cs | utf8mb4 | 283 | | Yes | 0 |
| utf8mb4_pl_0900_as_cs | utf8mb4 | 284 | | Yes | 0 |
| utf8mb4_et_0900_as_cs | utf8mb4 | 285 | | Yes | 0 |
| utf8mb4_es_0900_as_cs | utf8mb4 | 286 | | Yes | 0 |
| utf8mb4_sv_0900_as_cs | utf8mb4 | 287 | | Yes | 0 |
| utf8mb4_tr_0900_as_cs | utf8mb4 | 288 | | Yes | 0 |
| utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | | Yes | 0 |
| utf8mb4_da_0900_as_cs | utf8mb4 | 290 | | Yes | 0 |
| utf8mb4_lt_0900_as_cs | utf8mb4 | 291 | | Yes | 0 |
| utf8mb4_sk_0900_as_cs | utf8mb4 | 292 | | Yes | 0 |
| utf8mb4_es_trad_0900_as_cs | utf8mb4 | 293 | | Yes | 0 |
| utf8mb4_la_0900_as_cs | utf8mb4 | 294 | | Yes | 0 |
| utf8mb4_eo_0900_as_cs | utf8mb4 | 296 | | Yes | 0 |
| utf8mb4_hu_0900_as_cs | utf8mb4 | 297 | | Yes | 0 |
| utf8mb4_hr_0900_as_cs | utf8mb4 | 298 | | Yes | 0 |
| utf8mb4_vi_0900_as_cs | utf8mb4 | 300 | | Yes | 0 |
| utf8mb4_ja_0900_as_cs | utf8mb4 | 303 | | Yes | 0 |
| utf8mb4_ja_0900_as_cs_ks | utf8mb4 | 304 | | Yes | 24 |
| utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 |
| utf8mb4_ru_0900_ai_ci | utf8mb4 | 306 | | Yes | 0 |
| utf8mb4_ru_0900_as_cs | utf8mb4 | 307 | | Yes | 0 |
| utf8mb4_zh_0900_as_cs | utf8mb4 | 308 | | Yes | 0 |
| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 |
| utf8mb4_nb_0900_ai_ci | utf8mb4 | 310 | | Yes | 0 |
| utf8mb4_nb_0900_as_cs | utf8mb4 | 311 | | Yes | 0 |
| utf8mb4_nn_0900_ai_ci | utf8mb4 | 312 | | Yes | 0 |
| utf8mb4_nn_0900_as_cs | utf8mb4 | 313 | | Yes | 0 |
| utf8mb4_sr_latn_0900_ai_ci | utf8mb4 | 314 | | Yes | 0 |
| utf8mb4_sr_latn_0900_as_cs | utf8mb4 | 315 | | Yes | 0 |
| utf8mb4_bs_0900_ai_ci | utf8mb4 | 316 | | Yes | 0 |
| utf8mb4_bs_0900_as_cs | utf8mb4 | 317 | | Yes | 0 |
| utf8mb4_bg_0900_ai_ci | utf8mb4 | 318 | | Yes | 0 |
| utf8mb4_bg_0900_as_cs | utf8mb4 | 319 | | Yes | 0 |
| utf8mb4_gl_0900_ai_ci | utf8mb4 | 320 | | Yes | 0 |
| utf8mb4_gl_0900_as_cs | utf8mb4 | 321 | | Yes | 0 |
| utf8mb4_mn_cyrl_0900_ai_ci | utf8mb4 | 322 | | Yes | 0 |
| utf8mb4_mn_cyrl_0900_as_cs | utf8mb4 | 323 | | Yes | 0 |
+----------------------------+--------------+------+---------+----------+---------+
149 rows in set
Character set collations have the following general characteristics:
Two different character sets cannot have the same collation.
Each character set has a default collation. The
SHOW CHARACTER SETstatement indicates the default collation for each character set. TheSHOW COLLATIONstatement has a column that indicates whether a collation is the default for its character set (Yes if it is, otherwise empty).The name of a collation starts with the name of the character set to which it is associated, and is usually followed by one or more suffixes that indicate other collation characteristics.