During database operation, various exceptions may occur, such as application errors, database connection errors, database permission issues, database resource issues, and network issues. Among all of these, one situation is application disconnection.
To help you quickly identify the root cause of application disconnection issues and resolve them efficiently, this guide provides a clear and practical troubleshooting process. This process outlines specific steps to take, aiming to improve issue resolution efficiency and minimize the impact on business operations, providing strong support for daily maintenance tasks.
Process overview
When an application is disconnected, you can follow this process to troubleshoot the issue.
First, determine the programming language used for the application.
Java
If the application is developed in Java, you can check the error logs of the Java application to determine the reason for the disconnection. The steps are as follows:
Check whether the
Communications link failureexception is included in the error logs.If the
Communications link failureexception is not included in the error logs, you can try to reproduce the scenario:- If you can reproduce the scenario, you can analyze the error cause by debugging the program code or analyzing the network packets.
- If you cannot reproduce the scenario, you can infer the error cause based on the existing information and the program logic.
If the
Communications link failureexception is included in the error logs, proceed to the next step.
Check the information in the
Caused bysection of the exception log.If the information in the
Caused bysection is not printed, the stack trace is incomplete, and the cause cannot be determined. You need to reproduce the issue after the application prints the complete stack trace.If the information in the
Caused bysection is printed, you can determine the disconnection cause based on the output in theCaused bysection:If the content is
Connection reset,can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost, orunexpected end of stream, read XXX bytes from xxx (socket was closed by server), the application-side connection is disconnected. You need to determine the cause based on the error message. For more information, see Application-side connection is disconnected in this topic.If the content is
read timed outorConnection timed out (Read failed), slow SQL has triggered thesocketTimeoutset for the application, causing the client to disconnect. You need to determine whether the exception SQL is a slow SQL. If it is a slow SQL, you can optimize the SQL as needed. If no slow SQL exists, consider adjusting thesocketTimeout. For more information, see Database connection pool configuration.If the content is
connect timed outorConnection timed out: connect, the TCP handshake fails. The destination address and port are unreachable, which may be due to network issues or incorrect IP and port configurations in the program.If the content is
Connection refused, the connection is refused. The destination address and port are reachable, but the connection is rejected due to a system or network issue.
C
If the application is developed in C, you can check the error logs of the C application to determine the reason for the disconnection.
If the error message is
end-of-file on communication channelorLost connection to MySQL server during query, the application-side connection is disconnected. You need to determine the cause based on the error message. For more information, see Application-side connection is disconnected in this topic.If the error message is
reading initial communication packetorreading authorization packet, the database connection fails. The failure may be due to network issues or issues with the OceanBase Database service. You need to analyze the logs, and packet capture may be required.If the application logs do not contain obvious error messages, you can try to reproduce the scenario.
- If you can reproduce the scenario, you can analyze the error cause by debugging the program code or analyzing the network packets.
- If you cannot reproduce the scenario, you need to infer the error cause based on the existing information and the program logic.
Application-side connection is disconnected
If the application-side connection is disconnected, check whether the log contains conn id or the SQL statement executed when an exception is printed. If neither is present, the critical information is missing, and you need to contact the application side to obtain more information for diagnostics. If the log contains either of the above information, you need to clarify the data source configuration in the current program and follow the steps below.
Clarify the access path based on the data source. The access path includes the following information:
The number and address of OBProxy nodes
The number and address of OBServer hosts
Determine the current access path and check whether the application goes through OBProxy.
The application connects to the database through OBProxy
Use the
sshcommand to log in to the corresponding OBProxy node.Execute the
pscommand to view the start time of the obproxy process.Confirm whether the application was restarted during the error period.
If the application was restarted, the issue is likely caused by the ODP restart.
If the application was not restarted, proceed with the next step.
If the application was not restarted, run the
cdcommand to navigate to the ODP log directory. The ODP logs are stored in the/logdirectory within the installation directory of ODP.Check for
conn_id. If it is not present, check for the SQL statement related to the exception. If neither is present, contact the application side and obtain more information.If
conn_idis present, run the following command, replacingxxxxxxwith the actualconn_id, to filter obproxy logs from the exception period:grep "xxxxxx" obproxy.logReplace
xxxxxwith the actualconn_id.If the SQL statement related to the exception is present, run the following commands, substituting a distinctive substring of the SQL text and the time of the error, to filter obproxy logs from the exception period:
grep "Error text" obproxy.log.xxxx | grep "Error time point"grep "Error text" obproxy_error.log.xxxx | grep "Error time point"
Based on the filtered log entries, find the
trace_idof obproxy during the exception.Run the following command, replacing
xxxxxxwith the actualtrace_id, to filter obproxy logs from the exception period:grep "xxxxxx" obproxy.log.xxxx | grep "Error time point"Replace
xxxxxwith the actualtrace_id.Based on the obproxy log output, determine the cause of the connection disconnection.
If the connection was actively disconnected by obproxy, analyze the cause of the disconnection based on the filtered logs. If necessary, analyze the obproxy source code.
If the connection was disconnected by the observer, leading to the application connection being disconnected, find the OBServer node and observer session_id at the time of disconnection based on the obproxy log entries, and proceed with the diagnostics analysis as described in Log in to the corresponding OBServer node directly via SSH below.
If the connection was disconnected by a network device between the application server and obproxy (such as F5, SLB, or other network devices), you need to troubleshoot the network devices between the application server and obproxy.
Log in to the corresponding OBServer node directly via SSH
Use the
sshcommand to log in to the corresponding OBServer node.Execute the
pscommand to view the start time of the observer process.Confirm whether the application was restarted during the error period.
If the application was not restarted, proceed with the next step.
If the OBServer node was not restarted, proceed to the next step.
If the application was not restarted, run the
cdcommand to navigate to the log directory.The following example assumes that the installation directory of OceanBase Database is
/home/admin/oceanbase. The actual log path may vary.cd /home/admin/oceanbase/logExecute the following command to filter out logs related to
session id.grep "observer session id" observer.loggrep "observer session id" observer.log.xxxBased on the filtered logs, determine the connection status.
- If the connection was actively disconnected by the observer, analyze the cause of the disconnection based on the filtered logs. If necessary, analyze the observer source code.
- If the connection was disconnected by a network device between obproxy and the observer, you need to troubleshoot the usage of the network devices between obproxy and the observer.
Typical cases
In a production environment, a business transaction times out due to an application error
read time out.The application experiences periodic disconnections due to idle connection timeouts.
When an application uses ODP-Sharding to connect to OceanBase Database, the business transaction fails with the error
Server connection execute error: Read timed out.When a business transaction is committed in OceanBase Database, the error
Transaction resolution unknownoccurs.When you connect to an OceanBase cluster or OceanBase Database through obproxy, the connection fails and the error
Access denied for user 'xxxx'@'xxxx' (using password: YES)is returned.When you connect directly to a regular tenant in OceanBase Database, the error
ERROR 5150 (HY000) : Tenant not in this serveris returned.When you connect to an OBServer node of OceanBase Database V4.x, the error
unexpected end of stream, read 0 bytes from 4is returned occasionally.
When a client executes an SQL statement, the error
Lost connection to MySQL server during queryis returned. Usually, this error is caused by an exception on the server side.When you connect to OceanBase Database's MySQL-compatible mode, the connection is disconnected and the error
ERROR 2013is returned.
