This topic describes the character set selection specifications in OceanBase Database.
You can set the character set at the tenant level, database level, table level, field level, or session level. Currently, OceanBase Database supports the utf8mb4, gbk, gb 18030, binary, and utf16 character sets.
Note
- For seamless migration, OceanBase Database considers
UTF8andUTF8MB4to be synonyms in syntax. - You cannot change the character set of a database.
For example, the following describes how to set the gbk character set:
Set the character set when creating a tenant.
You can add the
"charset=gbk"parameter to the create tenant statement.create tenant oracle replica_num = 1, resource_pool_list =('pool1'), charset = gbk set ob_tcp_invited_nodes = '%', ob_compatibility_mode = 'oracle', parallel_servers_target = 10, ob_sql_work_area_percentage = 20, secure_file_priv = "";You can select the gbk character set when creating a tenant in the OCP console.
Notice
- In Oracle mode, the character set is tenant-level. In a gbk tenant, all char, varchar2, and clob columns in user tables use the gbk character set, while the char and varchar2 columns in system tables retain the utf8 character set.
- You cannot modify the character set of an Oracle tenant.
Set the character set for a client (connection).
The client (connection) character set is the character set configured for interactions between the client and the server.
The client sends SQL statements to the server for execution and receives the execution results from the server. The server needs to know the client's character set to correctly parse, execute, and return results. The client can be OBClient, JDBC, or OCI in different environments. As a result, the client character set is sometimes also referred to as the link character set.
The tenant character set and the client character set are independent of each other.
A tenant using the gbk character set can be connected to by clients using the gbk character set or the UTF8 character set.
If the client character set is gbk, the server will parse and execute the received SQL statement using the gbk character set.
If the client character set is UTF8, the server will parse and execute the received SQL statement using the UTF8 character set.
Configuration methods
Permanent modification
set global character_set_client = gbk; set global character_set_connection = gbk; set global character_set_results = gbk;character_set_client: the client character set.
character_set_connection: the connection character set. In Oracle mode, it is recommended to set this parameter to the same value as character_set_client.
character_set_results: the character set of the results returned from the server to the client.
Generally, the character sets for strings sent from the client to the server and from the server to the client are the same. Therefore, in Oracle mode, it is recommended to set the three parameters to the same value; in MySQL mode, the three parameters can be configured flexibly. Generally, it is sufficient to set the three parameters to the client character set.
Temporary modification (effective only for the current session)
Method 1:
set character_set_client = gbk; set character_set_connection = gbk; set character_set_results = gbk;Method 2:
set names gbk;
Set the client character set
When you use JDBC to connect to OceanBase Database, you can add the
characterEncoding=gbkparameter to the URL.String url = "jdbc:oceanbase://xxx.xxx.xxx.xxx:xxxx?useSSL=false&useUnicode=true&characterEncoding=gbk&connectTimeout=30000&rewriteBatchedStatements=true";When you use OBClient to connect to the database, we recommend that you use the
zh_CN.GB18030character set, which is a superset ofzh_CN.GBK, for the bash environment variable.Modify the bash environment variable
export LANG=zh_CN.GB18030 export LC_ALL=zh_CN.GB18030Modify the encoding setting of your terminal and set the current window to the gbk encoding. Please operate according to the instructions on the terminal interface.
Notice
Except for configuring the observer process (database) to use the gbk character set, you must also configure the client and driver to use the gbk character set. Otherwise, garbled characters may appear.