OceanBase Database is deployed on cost-effective common servers. Data is written to more than half of the OBServer nodes (three or more) based on the Paxos protocol. Therefore, no data is lost when a minority of OBServer nodes fail, ensuring a recovery point objective (RPO) of 0. If the leader fails, a new leader is automatically elected from the followers to continue to provide services within a short time without manual intervention, ensuring a recovery time objective (RTO) of less than 8s. This design achieves a balance between strong consistency and high availability. OceanBase Database supports flexible deployment modes, including three IDCs in the same city, three IDCs across two regions, and five IDCs across three regions, to provide lossless city-wide disaster recovery and geo-disaster recovery.
RootService
RootService manages the OBServer nodes in an OceanBase cluster. The OBServer nodes report their process status to RootService every 2s by using heartbeat packets. RootService monitors the heartbeat packets of the OBServer nodes to obtain the status of the current observer processes.
Parameters for managing the heartbeat status of OBServer nodes are described as follows:
lease_timeThis parameter specifies the heartbeat lease duration. The default value is 10s. For more information about this parameter, see lease_time.
If RootService does not receive a heartbeat packet from an OBServer node within the period specified by
lease_time, RootService considers the observer process temporarily disconnected and marks its heartbeat status aslease_expired.server_permanent_offline_timeThis parameter specifies the time threshold for heartbeat missing. When the heartbeat of an OBServer node is missing for the specified time threshold, the OBServer node is considered permanently offline. Data replicas on a permanently offline OBServer node must be automatically supplemented. The default value is 3600s. For more information about this parameter, see server_permanent_offline_time.
If RootService does not receive a heartbeat packet from an OBServer node within the period specified by
server_permanent_offline_time, RootService considers the observer process disconnected and marks its heartbeat status aspermanent_offline.
Note
You can execute the SHOW PARAMETERS statement to query for the values of the preceding parameters. For example: SHOW PARAMETERS LIKE 'server_permanent_offline_time';.
RootService can obtain status information of an OBServer node in one of the following scenarios based on heartbeat packets:
The OBServer node properly sends heartbeat packets, and the disk status is normal in the heartbeat packets. In this case, RootService considers that the OBServer node is working properly.
The OBServer node properly sends heartbeat packets, but the disk status is abnormal in the heartbeat packets. In this case, RootService considers that the observer process still exists, but the disk of the OBServer node is faulty. In addition, RootService switches all leaders on this OBServer node to another OBServer node.
No heartbeat packet is received from the OBServer node within the period specified by
lease_time, and the heartbeat status of the OBServer node is marked aslease_expired. In this case, RootService sets the status of the OBServer node to inactive and does not take other measures.No heartbeat packet is received from the OBServer node within the period specified by
server_permanent_offline_time, and the heartbeat status of the OBServer node is marked aspermanent_offline. In this case, RootService removes the data replicas on this OBServer node from the Paxos member group, and adds the removed data replicas to another active OBServer node to ensure that the Paxos member group of data replicas is complete.
Isolate a node
ODP monitors abnormal states of OBServer nodes, for example, the stopped state or INACTIVE state, to avoid routing application traffic to abnormal OBServer nodes. An OBServer node in the stopped state is still in the ACTIVE state, but the value of stop_time is greater than 0.
If an OBServer node is unstable, it may fail to provide services from time to time and may have various abnormal conditions, such as slow response. If a client connects to this OBServer node, the client may experience unstable database services. OceanBase Database provides the STOP SERVER command for isolating OBServer nodes. An isolated OBServer node does not provide external services, and ODP does not route requests to the OBServer node. Therefore, the OBServer node can be diagnosed, replaced, or maintained without affecting the business. This mechanism facilitates O&M and emergency response. The STOP SERVER command checks whether all the remaining replicas meet the majority requirement and whether logs are synchronized. If the conditions are not met, the STOP SERVER command returns a failure. If the conditions are met, the STOP SERVER command returns success after all the leaders on the failed OBServer node are switched to normal OBServer nodes. After the STOP SERVER command returns success, you can perform any maintenance operations, for example, kill observer processes. For more information about the STOP SERVER command, see Isolate an OBServer node.
You can isolate an OBServer node when a single IDC fails:
If the OBServer node is still connected, for example, the status of the OBServer node is unstable due to packet loss of the network interface card, you must isolate the OBServer node; otherwise, frequent leader switchover affects applications.
If the OBServer node is disconnected, for example, the OBServer node fails, the OBServer node may not be isolated but is marked as
INACTIVEin theoceanbase.DBA_OB_SERVERSview and no longer receives application traffic.
Election is triggered when an OBServer node is isolated or the leader fails. The priority mechanism of the election service also takes the specified primary zone and the status of OBServer nodes into consideration, to ensure that the optimal replica is elected as the leader.
Manual leader switchover
You can also manually switch the leader/follower roles of log stream replicas.
Procedure:
Log on to the
systenant of the cluster as therootuser.Note that you must specify the corresponding fields in the following sample code based on your actual database configurations.
obclient -h10.xx.xx.xx -P2883 -uroot@sys -p***** -AFor more information about how to connect to a database, see Overview (MySQL mode) or Overview (Oracle mode).
Query for the zones, OBServer nodes, and log stream replicas of tenants.
obcllient [(none)]> SELECT a.TENANT_ID,a.LS_ID,a.SVR_IP,a.SVR_PORT,a.ZONE,a.role,b.TENANT_NAME,b.TENANT_TYPE FROM oceanbase.CDB_OB_LS_LOCATIONS a, oceanbase.DBA_OB_TENANTS b WHERE a.TENANT_ID=b.TENANT_ID; +-----------+-------+----------------+----------+-------+----------+-------------+-------------+ | TENANT_ID | LS_ID | SVR_IP | SVR_PORT | ZONE | role | TENANT_NAME | TENANT_TYPE | +-----------+-------+----------------+----------+-------+----------+-------------+-------------+ | 1 | 1 | xx.xx.xx.237 | 2882 | zone1 | LEADER | sys | SYS | | 1 | 1 | xx.xx.xx.218 | 2882 | zone3 | FOLLOWER | sys | SYS | | 1 | 1 | xx.xx.xx.238 | 2882 | zone2 | FOLLOWER | sys | SYS | | 1001 | 1 | xx.xx.xx.218 | 2882 | zone3 | FOLLOWER | META$1002 | META | | 1001 | 1 | xx.xx.xx.237 | 2882 | zone1 | LEADER | META$1002 | META | | 1001 | 1 | xx.xx.xx.238 | 2882 | zone2 | FOLLOWER | META$1002 | META | | 1002 | 1 | xx.xx.xx.218 | 2882 | zone3 | FOLLOWER | mysql001 | USER | | 1002 | 1 | xx.xx.xx.237 | 2882 | zone1 | LEADER | mysql001 | USER | | 1002 | 1 | xx.xx.xx.238 | 2882 | zone2 | FOLLOWER | mysql001 | USER | | 1002 | 1001 | xx.xx.xx.218 | 2882 | zone3 | FOLLOWER | mysql001 | USER | | 1002 | 1001 | xx.xx.xx.237 | 2882 | zone1 | LEADER | mysql001 | USER | | 1002 | 1001 | xx.xx.xx.238 | 2882 | zone2 | FOLLOWER | mysql001 | USER | | 1009 | 1 | xx.xx.xx.218 | 2882 | zone3 | FOLLOWER | META$1010 | META | | 1009 | 1 | xx.xx.xx.237 | 2882 | zone1 | LEADER | META$1010 | META | | 1009 | 1 | xx.xx.xx.238 | 2882 | zone2 | FOLLOWER | META$1010 | META | | 1010 | 1 | xx.xx.xx.218 | 2882 | zone3 | FOLLOWER | oracle001 | USER | | 1010 | 1 | xx.xx.xx.237 | 2882 | zone1 | LEADER | oracle001 | USER | | 1010 | 1 | xx.xx.xx.238 | 2882 | zone2 | FOLLOWER | oracle001 | USER | | 1010 | 1001 | xx.xx.xx.218 | 2882 | zone3 | FOLLOWER | oracle001 | USER | | 1010 | 1001 | xx.xx.xx.237 | 2882 | zone1 | LEADER | oracle001 | USER | | 1010 | 1001 | xx.xx.xx.238 | 2882 | zone2 | FOLLOWER | oracle001 | USER | | 1035 | 1 | xx.xx.xx.218 | 2882 | zone3 | FOLLOWER | META$1036 | META | | 1035 | 1 | xx.xx.xx.237 | 2882 | zone1 | LEADER | META$1036 | META | | 1035 | 1 | xx.xx.xx.238 | 2882 | zone2 | FOLLOWER | META$1036 | META | +-----------+-------+----------------+----------+-------+----------+-------------+-------------+ 24 rows in setFor more information about the
CDB_OB_LS_LOCATIONSview, see CDB_OB_LS_LOCATIONS.Execute the following commands to perform leader switchover:
ALTER SYSTEM SWITCH REPLICA role LS [=] ls_id SERVER [=] 'ip:port' TENANT = tenant_name;where
roleindicates the role of the replica. Valid values areLEADERandFOLLOWER.ls_idindicates the ID of the log stream replica to be modified.ip:portindicates the IP address and RPC port of the server where the log stream replica to be modified resides.tenant_nameindicates the tenant to which the log stream replica to be modified belongs.
For example:
obclient [(none)]> ALTER SYSTEM SWITCH REPLICA LEADER LS = 1001 SERVER = 'xx.xx.xx.218:2882' TENANT = oracle001;After the operation is completed, query the view again to check whether the roles of the replicas are modified.
obclient [(none)]> SELECT * FROM oceanbase.CDB_OB_LS_LOCATIONS WHERE TENANT_ID = 1010; +----------------------------+----------------------------+-----------+-------+----------------+----------+----------+-------+----------+-------------------------------------------------------------------+----------------------+--------------+ | CREATE_TIME | MODIFY_TIME | TENANT_ID | LS_ID | SVR_IP | SVR_PORT | SQL_PORT | ZONE | ROLE | MEMBER_LIST | PAXOS_REPLICA_NUMBER | REPLICA_TYPE | +----------------------------+----------------------------+-----------+-------+----------------+----------+----------+-------+----------+-------------------------------------------------------------------+----------------------+--------------+ | 2022-12-26 18:28:49.389751 | 2023-01-11 12:02:38.275763 | 1010 | 1 | xx.xx.xx.218 | 2882 | 2881 | zone3 | FOLLOWER | NULL | NULL | FULL | | 2022-12-26 18:28:49.389601 | 2022-12-26 18:28:54.441143 | 1010 | 1 | xx.xx.xx.237 | 2882 | 2881 | zone1 | LEADER | xx.xx.xx.218:2882:1,xx.xx.xx.237:2882:1,xx.xx.xx.238:2882:1 | 3 | FULL | | 2022-12-26 18:28:49.390567 | 2023-01-11 12:02:37.970540 | 1010 | 1 | xx.xx.xx.238 | 2882 | 2881 | zone2 | FOLLOWER | NULL | NULL | FULL | | 2022-12-26 18:29:01.334732 | 2023-01-11 11:59:31.899581 | 1010 | 1001 | xx.xx.xx.218 | 2882 | 2881 | zone3 | LEADER | xx.xx.xx.218:2882:1,xx.xx.xx.237:2882:1,xx.xx.xx.238:2882:1 | 3 | FULL | | 2022-12-26 18:29:01.335629 | 2023-01-11 12:02:37.709677 | 1010 | 1001 | xx.xx.xx.237 | 2882 | 2881 | zone1 | FOLLOWER | NULL | NULL | FULL | | 2022-12-26 18:29:01.335822 | 2023-01-11 12:02:37.970540 | 1010 | 1001 | xx.xx.xx.238 | 2882 | 2881 | zone2 | FOLLOWER | NULL | NULL | FULL | +----------------------------+----------------------------+-----------+-------+----------------+----------+----------+-------+----------+-------------------------------------------------------------------+----------------------+--------------+ 6 rows in set
Manual leader switchover applies to IDC-level and city-wide disaster recovery. You can switch the leader to a specified zone or region to adapt to the distribution of application traffic.