This topic describes the specifications for selecting character sets in OceanBase Database.
You can set character sets at the tenant, database, table, field, and session levels. OceanBase Database supports character sets such as utf8mb4, gbk, gb18030, binary, and utf16.
Note
- To support seamless migration, OceanBase Database treats
UTF8as a synonym ofUTF8MB4. - You cannot modify the database character set.
In the following examples, the gbk character set is used:
Set the character set when you create a tenant
Add
"charset=gbk"in theCREATE TENANTstatement to set the character set.create tenant oracle replica_num = 1, resource_pool_list =('pool1'), charset = gbk set ob_tcp_invited_nodes = '%', ob_compatibility_mode = 'oracle', parallel_servers_target = 10, ob_sql_work_area_percentage = 20, secure_file_priv = "";Select gbk as the character set when you create a tenant in OceanBase Cloud Platform (OCP).
Notice
- In Oracle mode, the character set is specified at the tenant level. In a tenant whose character set is set to gbk, the gbk character set is used for the char, varchar2, and clob fields in all user tables. However, the utf8 character set is still used for the char, varchar2, and other fields in system tables.
- In Oracle mode, the character set of a tenant cannot be modified. In other words, you cannot use the
ALTERstatement to modify the character set of a tenant, database, table, or column.
Set the client (link) character set
The client (link) character set is a character set used for the interaction between the client and the server.
The client sends SQL strings to the server for execution. The server then returns the execution results to the client. In this process, the server must know the character set used by the client to correctly parse and execute the SQL strings and return the results. In different environments, the client can be OceanBase Command-Line Client (OBClient), Java Database Connectivity (JDBC), or Oracle Call Interface (OCI). The client character set is also called the link character set.
The tenant character set and the client character set are independent of each other.
A tenant with the gbk character set can be accessed by a client with the gbk or utf8 character set.
If the client character set is gbk, the server parses and executes the received SQL statements based on gbk.
If the client character set is utf8, the server parses and executes the received SQL statements based on utf8.
Configuration method:
Permanent configuration
set global character_set_client = gbk; set global character_set_connection = gbk; set global character_set_results = gbk;character_set_client: the client character set.character_set_connection: the connection character set. In an Oracle tenant, set this parameter to the same value as that ofcharacter_set_client.character_set_results: the character set of the results returned by the server to the client.Generally, the character set of the strings that the client sends to the server and the character set of the strings that the server returns to the client are the same. Therefore, in Oracle mode, the values of these three variables must be the same. In MySQL mode, these three variables are provided for flexible configuration. In general scenarios, you can set the three variables to the client character set.
Temporary configuration (valid only for the current session):
Method 1:
set character_set_client = gbk; set character_set_connection = gbk; set character_set_results = gbk;Method 2:
set names gbk;
Set the client character set
When you use the JDBC driver to connect to OceanBase Database, add
characterEncoding=gbkto the URL to set up a GBK link.String url = "jdbc:oceanbase://xxx.xxx.xxx.xxx:xxxx?useSSL=false&useUnicode=true&characterEncoding=gbk&connectTimeout=30000&rewriteBatchedStatements=true";When you use OBClient to connect to OceanBase Database, we recommend that you use the
zh_CN.GB18030superset ofzh_CN.GBKfor the bash environment variables of the GBK link.Modify the bash environment variables.
export LANG=zh_CN.GB18030 export LC_ALL=zh_CN.GB18030Modify the character set configuration of the terminal to set the character set of the current window to gbk. Follow the instructions on the terminal.
Notice
You must set the client and driver accordingly. Otherwise, garbled characters may be displayed.