This topic describes how to troubleshoot bootstrap failures in OceanBase Database.
Applicable versions
The solution provided in this topic is applicable to all versions of OceanBase Database.
Troubleshooting
Bootstrap is the process of initializing system tenants after an OBServer node is started. If bootstrap fails, you can perform the following steps for troubleshooting:
Check whether the infrastructure of the OBServer node meets the requirements.
The basic environment of the OceanBase cluster to which the OBServer node belongs must meet the requirements.
Check the clock synchronization status.
Run the
chronyc sources -vorntpq -pcommand to verify the clock. If the clock is out of synchronization, correct the clock and restart the server.Check the network.
Make sure that the one-way network latency is within 50 ms, or in the worst cases, within 100 ms. In addition, make sure that the network is stable without retransmission. Then, attempt to restart the server.
Check whether the information about the bootstrap command is as expected.
First, check whether all servers in the rs_list list are online. In addition, make sure that the bootstrap command is run in the cluster for the first time. If file directories contain soft links, check whether the content in the soft links is cleared. If not, clear the content. If you do not clear the content, the bootstrap will fail, triggering a 4015 error. To check the information about the bootstrap command, perform the following steps:
- On the server where the bootstrap command was run, view the
observer.logfile. Open theobserver.logfile and search for error logs by theBOOTSTRAPkeyword. The key information isobserver is not empty(ret=-4015).
WARN [BOOTSTRAP] bootstrap (ob_service.cpp:2233) [52707][1110][YB426451898A-00059E8E47A3C4DE] [lt=14] [dc=0] observer is not empty(ret=-4015)View the error message in the logs.
Bootstrap usually fails because file directories are not empty. Possible causes:
A bootstrap command has been run on the server before. This involves the following two scenarios:
The current server has been registered with RootService in another cluster.
In this case, the current server is already in use and cannot be reinitialized in the current cluster. You need to select another available server.
The current server is not registered with another cluster.
In this case, the system generates the sstable, clog, ilog, slog, and shm files.
Multiple clusters are started in the same directory.
File directories contain soft links.
In this case, you must sort and then initialize the clusters.
- On the server where the bootstrap command was run, view the
Check whether the clog directory of the OBServer node has sufficient space.
When the used space of the clog directory reaches 95% of the total space, the OBServer node stops writing logs to the clog directory by default. The threshold is determined by the
log_disk_usage_limit_percentageparameter. If clogs or other files occupy more than 95% of the space of the clog directory, the bootstrap of the OBServer node will fail. For information about how to troubleshoot the issue that clog disk usage exceeds the threshold, see Full usage of OBServer clog disk.Check whether system resources were insufficient when the bootstrap command was run in the cluster.
Bootstrap in an OceanBase cluster will fail if system resources, such as CPU and memory resources, are insufficient. In this case, the
OB_INVALID_RESOURCE_UNITerror message is generated. If this error message is generated when system resources are sufficient, identify the cause of the bootstrap timeout error through further diagnostics.Check whether all OBServer nodes are started with the same
cluster_idvalue.When you start an OBServer node, you must specify a valid value for the
cluster_idparameter. The value range of thecluster_idparameter is [1, 4294901759].Check whether bootstrap in OceanBase Cloud Platform (OCP) fails.
If bootstrap in an OceanBase cluster fails in OCP and you can see a task flow being created in OCP, check the detailed error message in the subtask.
Sample scenario:
An error occurs in the at_bootstrap_observer step of the subtask flow. The error message carries the error code 4216. A known cause is that the domain name configured for OCP is invalid when you use OCP to manage the OceanBase cluster. When you run the
curlcommand on an OBServer node to access a configureURL value exposed through OCP, the OBServer node parses the domain name of OCP. If the domain name of OCP is invalid, a 4216 error is reported.
If all server configurations meet the requirements and bootstrap failures do not result from the preceding common causes, you must systematically analyze the bootstrap processes. All bootstrap processes can be combined by using the BOOTSTRAP keyword. You can query the bootstrap progress by the BOOTSTRAP keyword on the corresponding server. The bootstrap progress information provides a reference for further diagnostics. If you still cannot determine the cause after further diagnostics, contact OceanBase Technical Support.