During database operation, various exceptions may occur, such as application errors, database connection errors, database permission issues, database resource issues, and network issues. Application disconnection is one such scenario.
To help you quickly identify the root cause of this issue and resolve it efficiently, we have summarized a clear and practical troubleshooting process for application disconnection issues. This process provides specific steps to improve the efficiency of issue resolution, minimize the impact on business operations, and provide strong support for daily operations and maintenance.
Procedure overview
When an application disconnection occurs, follow this procedure to troubleshoot the issue.
First, determine the programming language used by the application.
Java
If you confirm that the application is developed in Java, you can check the error logs of the Java application to determine the reason for the disconnection. Here's how:
Check if the error log contains the
Communications link failureexception.If the error log does not contain the
Communications link failureexception, you can try to reproduce the scenario:- If you can reproduce the scenario, you can analyze the cause of the error in the program code by debugging the program code or analyzing the network packets.
- If you cannot reproduce the scenario, you can try to infer the cause of the error based on the existing information and the logic of the program.
If the error log contains the
Communications link failureexception, proceed to the next step.
Check the information corresponding to the
Caused bysection of the exception log.If this section is not printed, the stack trace is incomplete, and the cause cannot be determined. You need to reproduce the issue after the application prints the complete stack trace and then analyze it.
If this section is printed, you can determine the cause of the disconnection based on the output information in the
Caused bysection:If the content is
Connection reset"、"can not read response from server. Expected to read 4 bytes,read 0 bytes before connection was unexpectedly lostorunexpected end of stream, read XXX bytes from xxx (socket was closed by server), the connection from the application side was disconnected. You need to determine the cause based on the error message. For more information, see Application-side disconnection in this topic.If the content is
read timed out" or "Connection timed out (Read failed), the slow SQL triggered thesocketTimeouttime set in the application, causing the program to disconnect. In this case, you need to determine whether the exception SQL is a slow SQL. If it is, you can optimize the SQL as needed. If there are no slow SQLs, you can consider adjusting thesocketTimeout. For more information, see Database connection pool configuration.If the content is
connect timed outorConnection timed out: connect, the TCP handshake failed. The destination address and port of the application are unreachable, which may be due to network issues or incorrect IP and port configurations in the program.If the content is
Connection refused, the connection failed. The destination address and port of the application are reachable, but the connection was rejected due to some system or network issue.
C
If you confirm that the application is developed in C, you can check the error logs of the C application to determine the reason for the disconnection.
If the error message is
end-of-file on communication channelorLost connection to MySQL server during query, the connection from the application side was disconnected. You need to determine the cause based on the error message. For more information, see Application-side disconnection in this topic.If the error message is
reading initial communication packetorreading authorization packet, the connection to the database failed. The failure may be due to network issues or issues with the OceanBase Database service. You need to analyze the logs, and packet capture may be necessary.If the application logs do not contain obvious error messages, you can try to reproduce the scenario.
- If you can reproduce the scenario, you can analyze the cause of the error in the program code by debugging the program code or analyzing the network packets.
- If you cannot reproduce the scenario, you need to try to infer the cause of the error based on the existing information and the logic of the program.
Application-side disconnection
When the application-side connection is disconnected, check the logs for a conn id identifier or the SQL statement executed when the exception occurred. If neither is present, critical information is missing and you need to contact the application team for more details. If either piece of information is present, identify the data source configuration in the application and proceed as follows:
Identify the access path based on the data source. The access path includes the number and addresses of the following components:
OBProxy instances and their addresses
OBServer hosts and their addresses
Determine the current access path and check whether the application connects through OBProxy.
The application connects to the database through OBProxy
Use the
sshcommand to log in to the corresponding OBProxy node.Use the
pscommand to check when the OBProxy process started.Check whether OBProxy restarted during the exception period.
If OBProxy restarted, the disconnection was likely caused by the OBProxy restart.
If OBProxy did not restart, proceed to the next step.
If OBProxy did not restart, use the
cdcommand to go to the OBProxy log directory. OBProxy logs are stored in the/logdirectory under the OBProxy installation directory.Check whether the logs contain a
conn_id. If not, check whether they contain SQL text related to the exception. If neither is present, contact the application team for more information.If a
conn_idis present, run the following command to filter OBProxy logs from the exception period:grep "xxxxxx" obproxy.logReplace
xxxxxwith the actualconn_id.If SQL text related to the exception is present, substitute part of the SQL text into the following commands to filter OBProxy logs from the exception period:
grep "error text" obproxy.log.xxxx | grep "time of error"grep "error text" obproxy_error.log.xxxx | grep "time of error"
From the filtered log entries, find the OBProxy
trace_idfrom the exception period.Substitute the
trace_idinto the following command to filter OBProxy logs from the exception period:grep "xxxxxx" obproxy.log.xxxx | grep "time of error"Replace
xxxxxwith the actualtrace_id.Based on the OBProxy log output, determine how the connection was disconnected.
If OBProxy actively disconnected the connection, analyze the cause from the filtered logs. If necessary, review the OBProxy source code.
If the connection was terminated by OBServer, which caused the application disconnection, find the connected OBServer node and the OBServer
session_idfrom the OBProxy log entry, then follow Use SSH to directly log in to the corresponding OBServer node below for further diagnosis.If a network device between the application server and OBProxy (such as F5, SLB, or other network devices) disconnected the link, troubleshoot the network devices between the application and OBProxy.
Use SSH to directly log in to the corresponding OBServer node
Use the
sshcommand to log in to the corresponding OBServer node.Use the
pscommand to check when the OBServer process started.Check whether OBServer restarted during the exception period.
If OBServer restarted, the disconnection was likely caused by the OBServer restart.
If OBServer did not restart, proceed to the next step.
If OBServer did not restart, use the
cdcommand to go to the log directory.The following example assumes that OceanBase Database is installed in
/home/admin/oceanbase. Use the actual log path in your environment.cd /home/admin/oceanbase/logRun the following commands to filter logs related to
session id:grep "observer session id" observer.loggrep "observer session id" observer.log.xxxBased on the filtered logs, determine the connection status.
- If OBServer actively disconnected the connection, analyze the cause from the filtered logs. If necessary, review the OBServer source code.
- If a network device between OBProxy and OBServer disconnected the link, troubleshoot the network devices between OBProxy and OBServer.
Case studies
In a production environment, a business transaction times out due to an application error
read time out.The application periodically disconnects due to idle connection timeout.
The application uses ODP-Sharding to connect to OceanBase Database and reports an error:
Server connection execute error: Read timed out.OceanBase Database reports an error
Transaction resolution unknownduring business commit.When you connect to an OceanBase cluster or OceanBase Database through obproxy, the connection fails and the following error message is returned:
Access denied for user 'xxxx'@'xxxx' (using password: YES).When you directly connect to a regular tenant in OceanBase Database, the error
ERROR 5150 (HY000): Tenant not in this servermay be returned.When you connect to an OBServer node in OceanBase Database V4.x, intermittent disconnections may occur with the error
unexpected end of stream, read 0 bytes from 4when the combined protocol is used.
- When a client executes SQL, it may report
Lost connection to MySQL server during query. This error is usually caused by an exception on the server side that interrupts the connection.
