Troubleshooting logic
When an observer process exits unexpectedly, you cannot find it by running the ps -ef command in the operating system. If the hardware and operating system are working properly, you can attempt to restart the observer process.
A core dump file is generated when an observer process exits unexpectedly. The file is as large as dozens or even hundreds of GB. You can analyze the root cause of the unexpected exit based on the file. In general, an observer process may exit unexpectedly due to the signal 11 error (invalid memory access) or signal 6 error (process abandonment). Therefore, we recommend that you check the observer process itself, the operating system, and the underlying system during troubleshooting.
Troubleshooting methods
When an observer process exits unexpectedly and a core dump file is generated, you can identify the root cause by using one of the following methods based on the OBServer version:
V1.x and versions earlier than V2.2.30 BP16
Directly debug the
core dumpfile.V2.2.30 BP16 and later, V2.2.7x, and V3.x and later
Directly analyze the
observer.logfile, because the diagnostics feature is enhanced in these versions. When an observer process exits unexpectedly, theobserver.logfile carries the call stack (also known as backtrace) information of theCRASH ERROR!!!keyword. The information contains the direct cause of the unexpected exit and all information about the issue. Sample error log:CRASH ERROR!!! sig=11, sig_code=1, sig_addr=0x42, tid=27311, tname=test_context, trace_id=Y0-0000000000000000, extra_info=((null)), lbt=0x58b9f0 0x58c00f 0x7fcb153fe61f 0x4a1342 0x9fc8fb 0x9f66f5 0x9dc80e 0x9dd097 0x9dd727 0x9e3fdf 0x9fdcd9 0x9f74eb 0x9e2c1b 0x4abcc0 0x4a801b 0x7fcb144b6444 0x4a1028 Segmentation fault (core dumped)
Troubleshooting procedure
If an observer process exits unexpectedly due to unknown causes and you cannot recover it by yourself, send related information to an OceanBase technical engineer to analyze the root cause. If you cannot immediately send the core file to the technical engineer due to the large file size or limits on the production environment, conduct preliminary diagnostics on the observer process based on the information about the issue. During diagnostics, obtain the direct cause of the unexpected exit and the thread stack information of the observer process that exits.
To analyze the root cause, perform the following steps based on the OBServer version:
Versions earlier than V2.2.76
Obtain the debuginfo RPM package for the observer process from an OceanBase technical engineer. Install or decompress the package, and then parse the call stack information of the observer process into a readable method name or debug the core file. The package version must be the same as the OBServer version. Sample command:
rpm2cpio oceanbase-debuginfo-xxx.el7.x86_64.rpm | cpio -divCopy the observer.debug file from the decompressed package to the observer process directory. The default directory is
/home/admin/oceanbase/bin.Debug the core file. Sample command:
gdb $observer $core_file -ex btExample:
gdb /home/admin/oceanbase/bin/observer core-observer-113058-1614065734 -ex btNotice
This command debugs the core file. If you directly debug a running observer process, the system availability will be affected. If you are interested in other features of GNU Debugger (GDB), verify them in the test environment.
The following figure shows the key information in the output results.
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/home/admin/oceanbase/bin/observer' Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f0798a354eb in raise ( from /lib64/libpthread.so.0 [Current thread is 1 (Thread 0x7F0430228700 (LWP 113058))] #0 0x000070798a354eb in raise () from /lib64/libpthread.so.0 #1 0x000000000915878 in oceanbase:: common:: coredump_cb(int, siginfo_t*) () #2 <signal handler called> #3 0x000070798204e9d in nanosleep () from /lib64/libc.so.6 #4 0x000070798204d34 in sleep () from /lib64/libc.so.6 #5 0x0000000009130d9 in oceanbase:: observer:: ObServer:: wait() () #6 0x000000000317 da82 in main()After you obtain the key information, run the
quitcommand on the command line interface (CLI) to exit GDB, or select the-batchoption to enter the non-interactive mode and directly redirect the output to a file.
V2.2.76 and later
Obtain the call stack information from the executable file of the observer process that exits unexpectedly or debug the core file. When an observer process exits unexpectedly, the observer.log file carries the
CRASH ERROR!!!keyword, which records important call stack information (the address hex value following the'lbt='keyword). Run the following command to obtain the call stack information:addr2line -pCfe $observer $symbol_addrExample:
addr2line -pCfe /home/admin/oceanbase/bin/observer 0x58b9f0 0x58c00f 0x7fcb153fe61f 0x4a1342 0x9fc8fb 0x9f66f5 0x9dc80e 0x9dd097 0x9dd727 0x9e3fdf 0x9fdcd9 0x9f74eb 0x9e2c1b 0x4abcc0 0x4a801b 0x7fcb144b6444 0x4a1028The example contains the original
CRASH ERROR!!!information and generated call stack information.
Send the direct cause (signal information) of the unexpected exit, call stack information, and OBServer version to the OceanBase technical engineer.