This topic describes the character set selection specifications in OceanBase Database.
You can set the character set at the tenant level, database level, table level, column level, or session level. Currently, OceanBase Database supports utf8mb4, gbk, gb18030, binary, and utf16 character sets.
Note
- For seamless data migration, OceanBase Database considers
UTF8andUTF8MB4to be synonyms in syntax. - Once a database is created, its character set cannot be changed.
For example, the gbk character set:
Set the character set when creating a tenant.
You can add the
"charset=gbk"parameter to the create tenant statement.create tenant oracle replica_num = 1, resource_pool_list =('pool1'), charset = gbk set ob_tcp_invited_nodes = '%', ob_compatibility_mode = 'oracle', parallel_servers_target = 10, ob_sql_work_area_percentage = 20, secure_file_priv = "";You can select the gbk character set when creating a tenant in the OCP console.
Notice
- In Oracle mode, the character set is tenant-level. In a gbk tenant, all char, varchar2, and clob columns use the gbk character set, while the char and varchar2 columns in system tables still use the utf8 character set.
- The character set of an Oracle mode tenant cannot be modified.
Set the client (link) character set.
The client (link) character set is the character set configured for interactions between the client and the server.
The client sends SQL strings to the server for execution and receives the execution results from the server. The server needs to know the client's character set to correctly parse, execute, and return results. The client can be OBClient, JDBC, or OCI in different environments. As a result, the client character set is sometimes also referred to as the link character set.
The tenant character set and the client character set are independent of each other.
A gbk tenant can be connected to by clients using the gbk character set or clients using the UTF8 character set.
If the client character set is gbk, the server will parse and execute the received SQL statement using the gbk character set.
If the client character set is UTF8, the server will parse and execute the received SQL statement using the UTF8 character set.
Configuration methods
Permanent modification
set global character_set_client = gbk; set global character_set_connection = gbk; set global character_set_results = gbk;character_set_client: the client character set.
character_set_connection: the connection character set. In Oracle mode, it is recommended to set this parameter to the same value as character_set_client.
character_set_results: the character set of the results returned from the server to the client.
Generally, the character sets for strings sent from the client to the server and from the server to the client are the same. Therefore, in Oracle mode, it is recommended to set the three parameters to the same value; in MySQL mode, you can flexibly configure the three parameters. Generally, it is sufficient to set the three parameters to the client character set.
Temporary modification (takes effect only for the current session)
Method 1:
set character_set_client = gbk; set character_set_connection = gbk; set character_set_results = gbk;Method 2:
set names gbk;
Set the client character set.
When you use JDBC to connect to OceanBase Database, you can usually add the
characterEncoding=gbkparameter to the URL.String url = "jdbc:oceanbase://xxx.xxx.xxx.xxx:xxxx?useSSL=false&useUnicode=true&characterEncoding=gbk&connectTimeout=30000&rewriteBatchedStatements=true";When you use OBClient to connect to the database, we recommend that you use the GB 18030 superset
zh_CN.GB18030for the bash environment variable.Modify the bash environment variable
export LANG=zh_CN.GB18030 export LC_ALL=zh_CN.GB18030Modify the encoding setting of your terminal and set the current window to the gbk encoding. Please follow the instructions on the terminal interface.
Notice
Apart from configuring the observer process (database) to use the gbk character set, you also need to configure the client and the driver. If the environment configurations are incorrect, garbled characters may be displayed.