During database operation, various exceptions may occur, such as application errors, database connection errors, database permission issues, database resource issues, and network issues. Among these, one common issue is application disconnection.
To help you quickly identify the root cause of this issue and resolve it efficiently, this section summarizes a clear and practical troubleshooting process for application disconnection issues. This process provides specific steps to enhance issue resolution efficiency, minimize the impact on business operations, and support daily maintenance tasks.
The troubleshooting process for application disconnection issues is shown in the following figure.
Process
When an application is disconnected, you can troubleshoot the issue by following this process.
First, determine the programming language used to develop the application.
Java
If you confirm that the application is developed in Java, you can check the error logs of the Java application to determine the reason for the disconnection. The steps are as follows:
Check whether the error logs contain the
Communications link failureexception.If the error logs do not contain the
Communications link failureexception, you can try to reproduce the scenario:- If you can reproduce the scenario, you can analyze the cause of the error by debugging the program code and analyzing the network packets.
- If you cannot reproduce the scenario, you can infer the cause of the error based on the existing information and the program logic.
If the error logs contain the
Communications link failureexception, proceed to the next step.
Check the information in the
Caused bysection of the exception logs.If the information in the
Caused bysection is not printed, the stack trace is incomplete, and the cause cannot be determined. You need to reproduce the issue with the complete stack trace and then analyze it.If the information in the
Caused bysection is printed, you can determine the cause of the disconnection based on the output in theCaused bysection:If the content is
Connection reset"、"can not read response from server. Expected to read 4 bytes,read 0 bytes before connection was unexpectedly lostorunexpected end of stream, read XXX bytes from xxx (socket was closed by server), the application-side connection was disconnected. You need to determine the cause based on the error message. For more information, see Application-side connection disconnected in this topic.If the content is
read timed out" or "Connection timed out (Read failed), the slow SQL triggered thesocketTimeoutsetting of the application, causing the program-side connection to be disconnected. In this case, you need to determine whether the SQL statement is a slow SQL statement. If it is, you can optimize the SQL statement as needed. If no slow SQL statements exist, you can adjust thesocketTimeoutsetting.If the content is
connect timed outorConnection timed out: connect, the TCP handshake failed. The application cannot access the destination address and port, which may be due to network issues or incorrect IP and port configurations.If the content is
Connection refused, the connection failed. The application can access the destination address and port, but the connection was rejected by a system or network issue.
C
If you confirm that the application is developed in C, you can check the error logs of the C application to determine the reason for the disconnection.
If the error message is
end-of-file on communication channelorLost connection to MySQL server during query, the application-side connection was disconnected. You need to determine the cause based on the error message. For more information, see Application-side connection disconnected in this topic.If the error message is
reading initial communication packetorreading authorization packet, the connection to the database failed. The failure may be due to network issues or issues with the backend OceanBase Database service. You need to analyze the logs and may need to capture packets to determine the cause.If the application logs do not contain obvious error messages, you can try to reproduce the scenario.
- If you can reproduce the scenario, you can analyze the cause of the error by debugging the program code and analyzing the network packets.
- If you cannot reproduce the scenario, you need to infer the cause of the error based on the existing information and the program logic.
Application-side connection is disconnected
When you determine that the application-side connection is disconnected, check whether the log contains conn id or the SQL statement executed when an exception is printed. If neither is found, you need to contact the application side to obtain more information for diagnosis. If the log contains either of the above information, you need to clarify the data source information configured in the current program and proceed with the following steps.
Clarify the access path based on the data source. The access path includes the number and address of the following components:
Number and address of OBProxy nodes
Number and address of OBServer hosts
Determine the current access path and check whether the application goes through OBProxy.
The application connects to the database through OBProxy
Use the
sshcommand to log in to the corresponding OBProxy node.Execute the
pscommand to view the start time of the obproxy process.Check whether the application was restarted during the exception.
If the application was restarted, the exception is caused by the restart of OBProxy.
If the application was not restarted, proceed to the next step.
If the application was not restarted, use the
cdcommand to navigate to the log directory of OBProxy. The logs of OBProxy are stored in the/logdirectory under the installation directory.Check whether
conn_idis present. If not, check whether the log contains the SQL statement related to the exception. If neither is found, you need to contact the application side to obtain more information.If
conn_idis present, you can substitute theconn_idin the following command to filter out the obproxy logs during the exception.grep "xxxxxx" obproxy.logReplace
xxxxxwith the actualconn_id.If the log contains the SQL statement related to the exception, you can substitute the relevant part of the SQL statement in the following command to filter out the obproxy logs during the exception.
grep "Error text" obproxy.log.xxxx | grep "Error time point"grep "Error text" obproxy_error.log.xxxx | grep "Error time point"
Based on the filtered log entries, find the
trace_idof obproxy during the exception.Substitute the
trace_idin the following command to filter out the obproxy logs during the exception.grep "xxxxxx" obproxy.log.xxxx | grep "Error time point"Replace
xxxxxwith the actualtrace_id.Based on the obproxy log output, determine the cause of the connection disconnection.
If the connection was actively disconnected by obproxy, analyze the disconnection cause based on the filtered logs. If necessary, analyze the obproxy source code.
If the connection was disconnected by the observer, causing the application connection to be disconnected, find the OBServer node and observer session_id at the time of disconnection based on the obproxy log entries, and proceed with the following steps for further diagnosis.
If the connection was disconnected by a network device between the application server and obproxy (such as F5, SLB, or other network devices), you need to troubleshoot the network devices between the application and obproxy.
The application directly logs in to the corresponding OBServer node through SSH
Use the
sshcommand to log in to the corresponding OBServer node.Execute the
pscommand to view the start time of the observer process.Check whether the application was restarted during the exception.
If the application was restarted, the exception is caused by the restart of the OBServer node.
If the application was not restarted, proceed to the next step.
If the application was not restarted, use the
cdcommand to navigate to the log directory.The following example assumes that the installation directory of OceanBase Database is
/home/admin/oceanbase. The actual log path may vary.cd /home/admin/oceanbase/logExecute the following command to filter out logs related to
session id.grep "observer session id" observer.loggrep "observer session id" observer.log.xxxBased on the filtered logs, determine the connection status.
- If the connection was actively disconnected by the observer, analyze the disconnection cause based on the filtered logs. If necessary, analyze the observer source code.
- If the connection was disconnected by a network device between obproxy and the observer, you need to troubleshoot the network devices between obproxy and the observer.
Examples
In a production environment, a business transaction times out because the application returns a
read time outerror. For more information, see Troubleshoot the "read time out" error caused by lock conflicts.The application periodically disconnects because of idle connection timeout. For more information, see Troubleshoot the periodic disconnection caused by idle connection timeout.
The application uses ODP-Sharding to connect to OceanBase Database. The business transaction returns an error:
Server connection execute error: Read timed out. For more information, see Troubleshoot the "Read timed out" error caused by disconnection.OceanBase Database returns an error:
Transaction resolution unknownwhen a business transaction is committed. For more information, see Troubleshoot the "Transaction resolution unknown" error.When you connect to an OceanBase cluster or OceanBase Database through obproxy, the connection fails and the following error is returned:
Access denied for user 'xxxx'@'xxxx' (using password: YES). For more information, see Troubleshoot the "Access denied for user 'xxxx'@'xxxx' (using password: YES)" error when you connect to an OceanBase cluster.When you connect to an OceanBase Database tenant directly, the following error is returned:
ERROR 5150 (HY000) : Tenant not in this server. For more information, see Troubleshoot the "Tenant not in this server" error when you connect to a tenant directly.When you connect to an OBServer node of OceanBase Database V4.x, the following error is returned occasionally:
unexpected end of stream, read 0 bytes from 4. For more information, see Troubleshoot the "unexpected end of stream, read 0 bytes from 4" error when you use the dual-protocol feature in OceanBase Database V4.x.
The client returns the
Lost connection to MySQL server during queryerror when it executes an SQL statement. This error is usually caused by a server-side exception. For more information, see Troubleshoot the "Lost connection to MySQL server during query" error.When you connect to OceanBase Database in MySQL mode, the connection is disconnected and the following error is returned:
ERROR 2013. For more information, see Troubleshoot the "ERROR 2013" error when the connection is disconnected.
