Tenant creation for a standby cluster got stuck

2023-08-18 09:26:34  Updated

After a tenant is created in the primary cluster, standby clusters logically synchronize the data definition language (DDL) statement for creating the tenant, and automatically create a tenant. However, the tenant creation in a standby cluster may get stuck. This topic describes how to identify the error, and the possible causes and solutions.

Identification methods

You can determine whether the tenant creation in a standby cluster is stuck by using the following methods:

  • Query the __ALL_TENANT table, and compare tenants in the primary cluster with those in the standby cluster.

    If some tenants are missing from the standby cluster, it means that the standby cluster is either creating a tenant or gets stuck.

  • Check the SYNCHRONIZATION_STATUS column in the V$OB_STANDBY_STATUS view of the primary cluster.

    If the value is SYS SCHEMA NOT SYNC, the schemas of the system tenants are not synchronized, and the tenant creation process in the standby cluster may get stuck.

  • To compare the schema versions of system tenants, query the V$OB_CLUSTER_STATS view by running the following command on the primary and standby clusters:

    obclient> SELECT REFRESHED_SCHEMA_VERSION FROM V$OB_CLUSTER_STATS WHERE TENANT_ID = 1;
    

    If the schema version of the system tenant in the standby cluster is earlier than that in the primary cluster, the standby cluster may be stuck.

  • In the primary cluster, query the DDL update that is about to be synchronized to the system tenant of the standby cluster.

    If it is a DDL operation for creating a tenant, the standby cluster is synchronizing the tenant creation DDL statement. However, if the synchronization lasts for a long period of time without a success feedback, it is likely that the tenant creation in the standby cluster is stuck.

    You can run the following SQL statement to check the synchronization status:

    obclient> SELECT SCHEMA_VERSION, TENANT_ID, DDL_STMT_STR
    FROM __ALL_DDL_OPERATION
    WHERE SCHEMA_VERSION > [standby_sys_schema_version]
    LIMIT 10;
    

Possible causes

The tenant creation in a standby cluster gets stuck usually because the resource pool failed to be created for the tenant. Perform the following steps to check whether the tenant creation failure is caused because the resource pool failed to be created. A tenant whose tenant_id is 1001 is used as an example

  1. Run the following statement to check whether a resource configuration named __unit_config_1001:

    obclient> SELECT * FROM __ALL_UNIT_CONFIG WHERE NAME LIKE '__unit_config_1001';
    

    If the resource configuration does not exist, tenant creation has not started. Otherwise, move on to the next step.

  2. Run the following statement to check whether a resource pool named __resource_pool_1001 exists:

    obclient> SELECT * FROM __ALL_RESOURCE_POOL WHERE NAME LIKE '__resource_pool_1001';
    

    If yes, the resource pool is created, and the error is not caused by failure to create the resource pool. Otherwise, the resource pool has not been created. Move on to the next step.

  3. Run the following statement to check whether a 'create_resource_pool' RootService event exists:

    obclient> SELECT * FROM __ALL_ROOTSERVICE_EVENT_HISTORY
    WHERE EVENT LIKE "%create_resource_pool%" AND VALUE2 LIKE "%__resource_pool_1001%"
    ORDER BY GMT_CREATE DESC 
    LIMIT 1;
    

    If yes , the creation of the resource pool failed.

The failure to create a resource pool for the tenant may be due to the following causes:

  • OBServers in the standby cluster are down

  • Active OBServers in the standby cluster cannot provide sufficient resources.

Solutions

  1. Check whether server failure has occurred in the standby cluster. Server failure may result in insufficient server resources for allocating resource units, and the -4656 error may be reported during the resource pool creation.

    If server failure has occurred, recover the faulty servers as soon as possible, to ensure that sufficient servers are available.

  2. If sufficient OBServers are active, you need to calculate the available resources, including the CPU and memory resources. You also need to reduce the tenant resource configuration of the standby cluster to ensure allocation of resource units in the zone. The standby cluster automatically retries to create a tenant resource pool until the creation succeeds.

    You can run the following command to adjust the tenant resource configuration in the standby cluster:

    obclient> ALTER RESOURCE UNIT __unit_config_1001 MIN_MEMORY='xx', MAX_MEMORY='xx';
    

    In the statement, __unit_config_ 1001specifies the resource configuration for tenant 1001. Note

    • We recommend that you also reduce the resource configuration for the primary cluster when you reduce the tenant resource configuration of the standby cluster, to maintain configuration consistency between the primary cluster and the standby cluster, so as to reduce O&M complexity arising from the heterogeneous configuration.

    • You can also reduce the resource configuration for the primary cluster, and then delete the unit resource configuration that is automatically created in the standby cluster, for example, __unit_config_1001. Then, the standby cluster automatically retries and creates a new resource configuration __unit_config_1001 that is consistent with the tenant resource configuration you set for the primary cluster.

Contact Us