After a tenant is created in the primary cluster, standby clusters logically synchronize the DDL statement for creating the tenant, and automatically create a tenant. However, the tenant creation in a standby cluster may get stuck. This topic describes how to identify the error, and the possible causes and solutions.
Identification methods
You can determine whether the tenant creation in a standby cluster is stuck by using the following methods:
Query the
__ALL_TENANTtable, and compare tenants in the primary cluster with those in the standby cluster.If some tenants are missing from the standby cluster, it means that the standby cluster is either creating a tenant or gets stuck.
Check the
SYNCHRONIZATION_STATUScolumn in theV$OB_STANDBY_STATUSview of the primary cluster.If the value is
SYS SCHEMA NOT SYNC, the schemas of the system tenants are not synchronized, and the tenant creation process in the standby cluster may get stuck.To compare the schema versions of system tenants, query the
V$OB_CLUSTER_STATSview by executing the following statement on the primary and standby clusters:obclient> SELECT REFRESHED_SCHEMA_VERSION FROM oceanbase.V$OB_CLUSTER_STATS WHERE TENANT_ID = 1;If the schema version of the system tenant in the standby cluster is earlier than that in the primary cluster, the standby cluster may be stuck.
In the primary cluster, query the DDL update that is about to be synchronized to the system tenant of the standby cluster.
If it is a DDL operation for creating a tenant, the standby cluster is synchronizing the tenant creation DDL statement. However, if the synchronization lasts for a long period of time without a success feedback, it is likely that the tenant creation in the standby cluster is stuck.
You can execute the following SQL statement to check the synchronization status:
obclient> SELECT SCHEMA_VERSION, TENANT_ID, DDL_STMT_STR FROM oceanbase.__ALL_DDL_OPERATION WHERE SCHEMA_VERSION > [standby_sys_schema_version] LIMIT 10;
Possible causes
The tenant creation in a standby cluster gets stuck usually because the resource pool failed to be created for the tenant. Perform the following steps to check whether the tenant creation failure is caused because the resource pool failed to be created. A tenant whose tenant_id is 1001 is used as an example
Run the following statement to check whether a resource configuration named
__unit_config_1001:obclient> SELECT * FROM oceanbase.__ALL_UNIT_CONFIG WHERE NAME LIKE '__unit_config_1001';If the resource configuration does not exist, tenant creation has not started. Otherwise, move on to the next step.
Run the following statement to check whether a resource pool named
__resource_pool_1001exists:obclient> SELECT * FROM oceanbase.__ALL_RESOURCE_POOL WHERE NAME LIKE '__resource_pool_1001';If yes, the resource pool is created, and the error is not caused by failure to create the resource pool. Otherwise, the resource pool has not been created. Move on to the next step.
Run the following statement to check whether a
'create_resource_pool'RootService event exists:obclient> SELECT * FROM oceanbase.__ALL_ROOTSERVICE_EVENT_HISTORY WHERE EVENT LIKE "%create_resource_pool%" AND VALUE2 LIKE "%__resource_pool_1001%" ORDER BY GMT_CREATE DESC LIMIT 1;If yes, the creation of the resource pool failed.
The failure to create a resource pool for the tenant may be due to the following causes:
OBServer nodes in the standby cluster are down
Active OBServer nodes in the standby cluster cannot provide sufficient resources.
Solutions
Check whether OBServer failure has occurred in the standby cluster. OBServer failure may result in insufficient server resources for allocating resource units, and the
-4656error may be reported during the resource pool creation.If OBServer failure has occurred, recover the faulty OBServer nodes as soon as possible, to ensure that sufficient OBServer nodes are available.
If sufficient OBServer nodes are active, you need to calculate the available resources, including the CPU and memory resources. You also need to reduce the tenant resource configuration of the standby cluster to ensure allocation of resource units in the zone. The standby cluster automatically retries to create a tenant resource pool until the creation succeeds.
You can execute the following statement to adjust the tenant resource configuration in the standby cluster:
obclient> ALTER RESOURCE UNIT __unit_config_1001 MIN_MEMORY='xx', MAX_MEMORY='xx';In the statement,
__unit_config_ 1001specifies the resource configuration for tenant1001.Note
We recommend that you also reduce the resource configuration for the primary cluster when you reduce the tenant resource configuration of the standby cluster, to maintain configuration consistency between the primary cluster and the standby cluster, so as to reduce O&M complexity arising from the heterogeneous configuration.
You can also reduce the resource configuration for the primary cluster, and then delete the unit resource configuration that is automatically created in the standby cluster, for example,
__unit_config_1001. Then, the standby cluster automatically retries and creates a new resource configuration__unit_config_1001that is consistent with the tenant resource configuration you set for the primary cluster.