This topic describes the character set specifications in OceanBase Database.
You can set the character set at the tenant level, database level, table level, field level, or session level. OceanBase Database supports the utf8mb4, gbk, gb 18030, binary, and utf16 character sets.
Note
- For seamless data migration, OceanBase Database considers
UTF8andUTF8MB4to be synonyms in syntax. - The database character set cannot be modified.
For example, the following describes how to set the gbk character set:
Set the character set when you create a tenant.
You can add the
"charset=gbk"parameter in the create tenant statement.create tenant oracle replica_num = 1, resource_pool_list =('pool1'), charset = gbk set ob_tcp_invited_nodes = '%', ob_compatibility_mode = 'oracle', parallel_servers_target = 10, ob_sql_work_area_percentage = 20, secure_file_priv = "";You can select the gbk character set when you create a tenant in the OCP console.
Notice
- In Oracle mode, the tenant character set is a tenant-level parameter. In a gbk tenant, all char, varchar2, and clob columns of user tables use the gbk character set, while char and varchar2 columns of system tables retain the utf8 character set.
- The tenant character set is an immutable parameter in Oracle mode. You cannot use the
alterstatement to change the character set of tenants, databases, tables, or columns.
Set the client (link) character set.
The client (link) character set is the character set configured for interactions between the client and the server.
The client sends SQL strings to the server for execution and receives execution results from the server. The server needs to know the client's character set to correctly parse, execute, and return results. The client can be OBClient, JDBC, or OCI in different environments. As a result, the client character set is also sometimes called the link character set.
The tenant character set and the client character set are independent of each other.
A tenant using the gbk character set can be connected to by clients using the gbk character set or the UTF8 character set.
If the client character set is gbk, the server will parse and execute received SQL statements based on the gbk character set.
If the client character set is UTF8, the server will parse and execute received SQL statements based on the UTF8 character set.
Configuration methods
Permanent modification
set global character_set_client = gbk; set global character_set_connection = gbk; set global character_set_results = gbk;character_set_client: the client character set.
character_set_connection: the connection character set. In Oracle mode, it is recommended to set this parameter to the same value as character_set_client.
character_set_results: the character set of the results returned from the server to the client.
Generally, the character sets for strings sent from the client to the server and from the server to the client are the same. Therefore, in Oracle mode, it is recommended to set these three parameters to the same value; in MySQL mode, you can flexibly configure these three parameters. Generally, it is sufficient to set these three parameters to the client character set.
Temporary modification (effective only for the current session)
Method 1:
set character_set_client = gbk; set character_set_connection = gbk; set character_set_results = gbk;Method 2:
set names gbk;
Set the client character set.
When you use JDBC to connect to OceanBase Database, you can usually add the
characterEncoding=gbkparameter to the URL to set the gbk character set for the link.String url = "jdbc:oceanbase://xxx.xxx.xxx.xxx:xxxx?useSSL=false&useUnicode=true&characterEncoding=gbk&connectTimeout=30000&rewriteBatchedStatements=true";When you use OBClient to connect to the database, we recommend that you use the superset
zh_CN.GB18030of the GBK character set in the bash environment.To modify the bash environment variables, run the following command:
export LANG=zh_CN.GB18030 export LC_ALL=zh_CN.GB18030To modify the encoding settings of your terminal, set the current window to the gbk encoding. Please operate according to the instructions on the terminal interface.
Notice
Except for configuring the observer process (database) to use the GBK character set, you need to configure the client and driver to use the GBK character set as well. If the environment is configured incorrectly, garbled characters may be displayed.