This topic describes the basic principles and possible errors of log synchronization.
Commit logs (clogs) is a core part of the log service of OceanBase Database. As an essential component of OceanBase Database, the log service supports many core features, such as the atomicity, durability, and isolation of transactions, and the high availability of OceanBase databases. During transaction commit, OceanBase Database uses the Paxos protocol to ensure that the clogs of the majority of nodes are stored in the disk. Eventually, clogs of the minority of nodes are persisted in the disk. OceanBase Database provides an automatic log recycling mechanism to ensure continuous log writing to the log disk.
Possible clog-related errors include full usage of the clog disk, stuck clog replay, clog synchronization delay, full occupation of the clog sliding window, and failed clog reconfirmation. This topic describes how to troubleshoot the full usage of the clog disk.
Troubleshooting methods
The status of the log module is displayed in the GV$OB_LOG_STAT and V$OB_LOG_STAT views. Typically, you can use these two views for the following purposes:
- Query whether a log stream has a leader and identify the replica where the leader is located.
- Query the member list, that is, the number of Paxos replicas, of a log stream.
- Query the replica synchronization status.
- Query the recyclable checkpoint of the logs of a log stream.
- Query the log consumption service scope of a log stream based on the log sequence number (LSN) or system change number (SCN).
- Query the access mode of the log module.
- Locate the causes of other issues.
Key parameters
| Parameter | Default value | Description | Applicable scope | Immediately take effect? |
|---|---|---|---|---|
log_disk_utilization_threshold |
80 | The allocated log disk space of a tenant is not fully used. Only the space within the threshold is used. If the disk usage exceeds the threshold, recyclable log files are recycled. The threshold is specified by this parameter. | Tenant level | Yes |
log_disk_utilization_limit_threshold |
95 | In scenarios such as when a large number of system writes are involved, the earliest log files cannot be recycled due to the checkpoint latency. Therefore, the actual usage of the log disk may exceed the value of the log_disk_utilization_threshold parameter for a short time, but cannot exceed the upper limit, which is specified by the log_disk_utilization_limit_threshold parameter. Once the usage of the log disk reaches the upper limit, the log disk rejects new writes. In this case, you need to contact O&M engineers. |
Tenant level | Yes |
Full usage of the clog disk
In general cases, the usage of the clog disk remains under a threshold. If the usage exceeds the threshold, recyclable log files are recycled, and it means that log file recycling is slow or stuck.
When the data corresponding to all partition logs in a clog file has been stored to SSTables by minor compactions, the clog file can be recycled. Therefore, the cause of a full clog disk is usually that the minor compaction of some partitions is stuck.
Troubleshooting procedure
If you receive an alert on the clog disk usage, perform the following steps for troubleshooting:
Check whether the log disk is full.
You can query the
GV$OB_UNITSview to check whether the log disk usage exceeds the disk space recycling threshold, which is specified by thelog_disk_utilization_thresholdparameter. The default threshold is 80%. To do so, execute the following SQL statement:select tenant_id, svr_ip, svr_port, LOG_DISK_IN_USE/1024/1024/1024 LOG_DISK_IN_USE_G, LOG_DISK_SIZE/1024/1024/1024 LOG_DISK_SIZE_G, LOG_DISK_IN_USE*100/LOG_DISK_SIZE LOG_DISK_USED_PERCENTAGE from gv$ob_units;Confirm abnormal log streams in the tenant.
If you find that the log disk usage of a tenant exceeds 80% of the allocated space, you need to confirm abnormal log streams in the tenant. (END_LSN - BASE_LSN) indicates the amount of logs (in bytes) that cannot be recycled from the corresponding log stream. If the usage exceeds 128 MB and the log disk is full, it means that the log stream has blocked log recycling. To check abnormal log streams, execute the following SQL statement:
select TENANT_ID, LS_ID, SVR_IP, ROLE , (end_lsn-base_lsn)/1024/1024 from gv$ob_log_stat;If the view is inaccessible, you can also query the system logs in the following way:
grep 'ERROR.*Txxxx_PalfGC' observer.log.* [2022-07-19 21:34:37.191950] ERROR [PALF] try_recycle_blocks (palf_env_impl.cpp:637) [147234][T1010_PalfGC][T1010][Y0-0000000000000000-0-0] [lt=21] clog disk space is almost full(total_size(MB)=165888, used_size(MB)=132710, used_percent(%)=80, warn_size(MB)=132710, warn_percent(%)=80, limit_size(MB)=157593, limit_percent(%)=95, maximum_used_size(MB)=74026, maximum_log_stream=1006, oldest_log_stream=1004, oldest_timestamp=1658223124759857784) BACKTRACE:0x434d92e 0xc1a323a 0xc194b93 0x44726d4 0x447239f 0x44721cb 0x4471ffd 0x45b102c 0x45b0339 0x43671aa 0x4365b6d 0x521ebf9 0xc396ec6 0xc391b19 0x7f337d6bfe24 0x7f337cf68f1cThe key fields are described as follows:
maximum_log_stream: the log stream that uses the most log disk space.oldest_log_stream: the log stream that contains the oldest logs.
Troubleshoot the root cause of the full disk usage.
Check for log replay or callback exceptions based on the information about the tenant, log stream, and node with the fully used disk obtained in the previous step.
- If the replica involved acts as the leader, you need to confirm the causes of the log synchronization exceptions.
- If the local disk write checkpoint of the leader is not the latest, it is likely that the disk write is slow or stuck.
- If the local disk write checkpoint of the leader replica is the latest, the local disk write is completed, and you need to check for exceptions of followers.
- It is likely that the log disk on a follower is full and cannot receive new logs. You can search for logs related to full disk usage on the follower that falls behind.
- It may be because of an error in the remote procedure call (RPC) communication between the leader and follower, such as the backlog of the RPC queue on the follower. You can search for logs related to the RPC queue backlog on the follower that falls behind.
- It is also possible that the allocation of memory for processing the RPC requests fails. You can search for logs related to memory allocation failures on the follower that falls behind.
- If replica involved acts as a follower, you need to confirm the causes of log replay exceptions.
- The replay service of OceanBase Database replays logs of a follower, and stops replaying subsequent logs when an unexpected error occurs on the OBServer node. This prevents consistency issues from becoming worse. A log replay exception does not necessarily indicate an issue with the replay service. You need to locate the specific abnormal modules based on the error logs.
- If the replica involved acts as the leader, you need to confirm the causes of the log synchronization exceptions.
Check for freeze exceptions.
Here are some sample system log entries of a freeze:
# A system-level freeze starts. receive minor freeze reques # A system-level freeze is completed. finish minor freeze request # A tenant-level freeze starts. [TenantFreezer] tenant_freeze start # A tenant-level freeze succeeds. [TenantFreezer] succeed to freeze tenant # The freeze of a log stream in the tenant fails. [TenantFreezer] fail to freeze logstream # A partition-level freeze starts. [TenantFreezer] tablet_freeze start # A partition-level freeze succeeds. [TenantFreezer] succeed to freeze tablet # A partition-level freeze fails. [TenantFreezer] fail to freeze tabletCheck for minor compaction exceptions.
Emergency measures
If the full usage of the clog disk has affected the majority of nodes and the transaction commit, the entire system may hang. In this case, if the upper-layer applications are still writing a large number of clogs, we recommend that you suspend the applications and restore the clog service by performing the operations as guided in the following Restore operations section.
If only a minority of nodes are affected, quickly analyze whether the nodes have bottlenecks in the I/O layer and network layer (or infrastructure layer).
- If yes, isolate the nodes and resolve the bottlenecks.
- If no, you can perform the following operations on the nodes.
Restore operations:
Automatic write stops when the usage of the clog disk reaches the default threshold of 95%. This threshold is specified by the
log_disk_utilization_limit_thresholdparameter. You can change the value of this parameter to 98% for the nodes whose clog disk is full. Then, the follower that falls behind will start to synchronize logs from the leader. If a large number of logs need to be synchronized, a rebuild is triggered to copy data and clogs from the leader to the follower.Query the corresponding internal tables to confirm the rebuild operation and the progress of clog synchronization.
If the issue persists after you increase the value of the
log_disk_utilization_limit_thresholdparameter, shut down the OBServer node, copy the clog files to another directory with larger storage space, and then create soft links to the files. Then, restart the OBServer node. If the usage of the clog disk drops below 95%, the OBServer node is recovering and all replicas are going to reach the sync state.Notice
This step is risky. We recommend that you perform this step to recover the system only when the cluster fails. When you copy log files, be careful to avoid improper operations.
Roll back the previous operations and recover the modified parameter values.
Possible causes of a full clog disk
O&M issues
The space for the clog disk is occupied by files of other users. As a result, the available space is insufficient.
Consecutive minor compaction failures
After minor compactions are performed to store the partition data described in a clog file to an SSTable and the partition metadata is persisted, the clog file can be recycled. Therefore, if minor compactions fail, or the OBServer node restarts upon a failure, the start replay timestamp for clogs does not advance, and clog files cannot be recycled. The logic of consecutive minor compaction failures is complex. If you encounter such failures, contact OceanBase Technical Support for diagnostics.