This topic describes the character set selection specifications in OceanBase Database.
You can set the character set at the tenant level, database level, table level, field level, or session level. Currently, OceanBase Database supports the utf8mb4, gbk, gb18030, binary, and utf16 character sets.
Note
- For seamless data migration, OceanBase Database considers
UTF8andUTF8MB4to be synonyms in syntax. - You cannot change the character set of a database.
For example, the following describes how to set the gbk character set:
Set the character set when creating a tenant
You can add the
"charset=gbk"parameter to the create tenant statement.create tenant oracle replica_num = 1, resource_pool_list =('pool1'), charset = gbk set ob_tcp_invited_nodes = '%', ob_compatibility_mode = 'oracle', parallel_servers_target = 10, ob_sql_work_area_percentage = 20, secure_file_priv = "";You can select the gbk character set when creating a tenant in the OCP console.
Notice
- In Oracle mode, the character set is tenant-level. In a gbk tenant, all char, varchar2, and clob columns of user tables use the gbk character set, while the char and varchar2 columns of system tables retain the utf8 character set.
- You cannot modify the character set of an Oracle tenant.
Set the character set for a client (connection)
The character set for a client (connection) is the character set configured for interactions between the client (such as OBClient, JDBC, and OCI) and the server.
The client sends SQL statements to the server for execution and receives the execution results from the server. The server needs to know the character set used by the client to correctly parse, execute, and return the results. Therefore, sometimes the character set for a client is also called the link character set.
The tenant character set and the client character set are independently configured.
A tenant using the gbk character set can be connected to by clients using the gbk character set or utf8 character set.
If the client character set is gbk, the server will parse and execute the received SQL statement using the gbk character set.
If the client character set is utf8, the server will parse and execute the received SQL statement using the utf8 character set.
Configuration methods
Permanent modification
set global character_set_client = gbk; set global character_set_connection = gbk; set global character_set_results = gbk;character_set_client: the client character set.
character_set_connection: the connection character set. In Oracle mode, it is recommended to set this parameter to the same value as character_set_client.
character_set_results: the character set of the results returned from the server to the client.
Generally, the character sets for strings sent from the client to the server and from the server to the client are the same. Therefore, in Oracle mode, it is recommended to set the three parameters to the same value; in MySQL mode, the three parameters can be flexibly configured. Generally, it is sufficient to set the three parameters to the client character set.
Temporary modification (effective only for the current session)
Method 1:
set character_set_client = gbk; set character_set_connection = gbk; set character_set_results = gbk;Method 2:
set names gbk;
Set the client character set
When you use JDBC to connect to OceanBase Database, you can add the
characterEncoding=gbkparameter to the URL.String url = "jdbc:oceanbase://xxx.xxx.xxx.xxx:xxxx?useSSL=false&useUnicode=true&characterEncoding=gbk&connectTimeout=30000&rewriteBatchedStatements=true";When you use OBClient to connect to the database, we recommend that you use the superset
zh_CN.GB18030of the GBK character set for the bash environment variable.Modify the bash environment variable
export LANG=zh_CN.GB18030 export LC_ALL=zh_CN.GB18030Modify the encoding setting of your terminal and set the current window to the gbk encoding. Perform operations based on the instructions in the terminal interface.
Notice
Apart from configuring the GBK character set for the observer process (database), you must also configure the GBK character set for the client and driver. Otherwise, garbled characters may appear due to configuration errors.