Arbitration server process startup failed and NIC missing error returned
Symptom
The following error message is returned upon an attempt to start the arbitration server process on the command-line interface (CLI):
[2023-11-08 11:01:54.555369] ERROR issue_dba_error (ob_log.cpp:1841) [2649528][observer][T0][Y0-0000000000000000-0-0] [lt=3][errcode=-4388] Unexpected internal error happen, please checkout the internal errcode(errcode=-4182, file="ob_net_util.cpp", line_no=212, info="can not find ifname by local ip")
[2023-11-08 11:01:54.555407] EDIAG [LIB] get_ifname_by_addr (ob_net_util.cpp:212) [2649528][observer][T0][Y0-0000000000000000-0-0] [lt=37][errcode=-4182] can not find ifname by local ip(local_ip=0.0.0.0) BACKTRACE:0x10a24a3c 0x5e92550 0xb0a0c38 0x5e9228c 0x5e8b32c 0x10f7503c 0x10f74ac8 0x8e3a8bc 0x8e31a30 0x5e8c4c4 0xffffaad3485c 0x453d958
[2023-11-08 11:01:54.555846] ERROR init_config (ob_server.cpp:1840) [2649528][observer][T0][Y0-0000000000000000-0-0] [lt=434][errcode=-4393] observer start process failure(local_ip is not a valid IP for this machine, local_ip="0.0.0.0")
Possible causes
According to the error message, local_ip is 0.0.0.0. A possible cause is as follows: During the first attempt to start the arbitration server process on the CLI, no network interface card (NIC) name or IP address is specified by using the -i eth0 or -I 10.xx.xx.xx parameter. The arbitration server attempts to guess devname but fails, and therefore uses the default bond0. Moreover, the -I parameter is not specified. As a result, the local_ip parameter is set to an empty string by default, namely 0.0.0.0, and the IP address is stored to the configuration file. Next time when you attempt to start the arbitration server process, the error is still returned due to the incorrect IP address stored last time, even if you specify the NIC name. This is because the arbitration server forcibly verifies the validity of the local_ip parameter. If the arbitration server can obtain the NIC name based on this parameter, the verification succeeds. Otherwise, the verification fails. However, the arbitration server cannot obtain the NIC name based on local_ip=0.0.0.0. Therefore, the verification fails, leading to the arbitration server process startup failure.
Solution
Method 1: Delete the persistent file in the /etc directory, then restart the arbitration server process by specifying -i eth0 on the CLI.
Method 2: Without deleting the persistent file, restart the arbitration server process by specifying -I 10.xx.xx.xx on the CLI.
Arbitration server process startup failed and killed error returned
Symptom and possible causes
If startup of the arbitration server process fails and the system returns the killed error each time you run the startup command, check the configuration of the vm.min_free_kbyte parameter on the machine. The kernel parameter vm.min_free_kbyte specifies the minimum size of idle memory reserved by the system to avoid memory fragmentation. If a newly started process causes the remaining memory on the machine to fall below this value, the process may be killed by the operating system.
Solution
You can appropriately decrease the value of the vm.min_free_kbyte parameter. For scenarios where the arbitration service is deployed on a small machine (physical memory of the server does not exceed 8 GB), we recommend that you keep the default value of this parameter.
To view and modify the vm.min_free_kbyte parameter, perform the following steps:
Log in to the machine where the arbitration server resides as the
rootuser.View the value of the system minimum free memory. The unit of this parameter is KB.
[root@xxx admin]# cat /proc/sys/vm/min_free_kbytesChange the value based on your business needs.
For example, you can change the value to
262144, which indicates 256 MB.[root@xxx admin]# echo 262144 > /proc/sys/vm/min_free_kbytes
References
For more information about O&M operations related to the arbitration service, see Arbitration high availability.