During database operations, various exceptions may occur, such as application errors, database connection errors, permission issues, resource problems, and network issues. Among these, one scenario is application disconnection.
To help you quickly identify the root cause and efficiently resolve application disconnection issues, we have summarized a clear and practical troubleshooting process. This process provides specific steps to enhance issue resolution efficiency and minimize business impact, supporting daily operations.
Process description
When encountering application disconnection issues, you can follow this process to troubleshoot the issue.
First, determine the application development language used by the user.
Java
If the application development language is Java, you can identify the cause of the disconnection by reviewing the Java application's error logs as follows:
Check if the error log contains the
Communications link failureexception.If the error log does not contain the
Communications link failureexception, try to reproduce the scenario:- If the scenario can be reproduced, analyze the cause of the error in the program code using methods such as debugging the program code and analyzing network packets.
- If the scenario cannot be reproduced, try to deduce the cause of the error based on the available information and the inherent logic of the program.
If the error log does contain the
Communications link failureexception, proceed to the next step.
Examine the information in the
Caused bysection of the exception log.If the information in the
Caused bysection is not present, it indicates that the stack trace is incomplete, and the cause cannot be determined. In this case, you need to reproduce the issue after the application prints a complete stack trace and then continue the analysis.If the information in the
Caused bysection is present, determine the cause of the connection drop based on the output of this section:If the content is
Connection reset","can not read response from server. Expected to read 4 bytes,read 0 bytes before connection was unexpectedly lost", or"unexpected end of stream, read XXX bytes from xxx (socket was closed by server)", it indicates that the application-side connection was dropped. In this case, further determine the cause based on the error message. For more information, see Application-side disconnected in this topic.If the content is
"read timed out"or"Connection timed out (Read failed)", it indicates that a slow SQL query triggered the application'ssocketTimeoutsetting, causing the program to drop the connection. In this case, determine whether the SQL query is a slow query. If it is, optimize the SQL query as needed. If it is not a slow query, consider adjusting thesocketTimeoutsetting. For more information, see Configure database connection pools.If the content is
"connect timed out"or"Connection timed out: connect", it indicates that the TCP handshake failed. This means that the destination address and port specified by the application are unreachable. This could be due to network issues or incorrect IP address or port configuration in the application.If the content is
"Connection refused", it indicates that the connection failed. The destination address and port specified by the application are reachable, but the connection was rejected for some system or network reason.
C
If the application development language is C, you can identify the cause of the disconnection by reviewing the C application's error logs as follows:
If the error message is
end-of-file on communication channelorLost connection to MySQL server during query, it indicates that the application-side connection was dropped. In this case, determine the cause based on the error message. For more information, see Application-side disconnection in this topic.If the error message is
reading initial communication packetorreading authorization packet, it indicates that the database connection failed. The failure could be due to network issues or problems with the backend OceanBase Database service. You can analyze the logs and possibly use packet capture to determine the cause.If the application logs do not contain obvious error messages, try to reproduce the scenario.
- If the scenario can be reproduced, analyze the cause of the error in the program code using methods such as debugging the program code and analyzing network packets.
- If the scenario cannot be reproduced, try to deduce the cause of the error based on the available information and the inherent logic of the program.
Application-side disconnection
When you determine that the connection was closed on the application side, check the logs for a conn id or the SQL statement executed at the time of the exception. If neither is present, key information is missing and you should contact the application team for more details to continue troubleshooting. If either piece of information is available, confirm the data source configuration in your program and follow these steps:
Based on the data source, identify the access chain. The information required to identify the access chain includes the following:
Number of ODP nodes and their address information
Number of OBServer nodes and their address information
Confirm whether the application uses ODP for access.
The application uses ODP to access the database
Run the
sshcommand to log in to the ODP node.Run the
pscommand to view the startup time of the obproxy process.Confirm whether the application was restarted during the error period.
If the application was restarted, the issue is likely caused by the ODP restart.
If the application was not restarted, proceed with the next step.
If the application was not restarted, run the
cdcommand to navigate to the ODP log directory. The ODP logs are stored in the/logdirectory within the installation directory of ODP.If
conn_idis not available in the logs, check for related SQL text. If neither is available, contact the application side for more information.If
conn_idis available, filter out the obproxy logs during the error period by using the following command:grep "xxxxxx" obproxy.logReplace
xxxxxwith the actualconn_idvalue.If there is related SQL text, filter out the obproxy logs during the error period by using the following command:
grep "Error text" obproxy.log.xxxx | grep "Error time point"grep "Error text" obproxy_error.log.xxxx | grep "Error time point"
From the filtered log entries, identify the
trace_idof the obproxy during the error period.Filter out the obproxy logs during the error period by using the following command:
grep "xxxxxx" obproxy.log.xxxx | grep "Error time point"Replace
xxxxxwith the actualtrace_idvalue.Analyze the obproxy logs to determine the disconnection issue.
If the obproxy actively disconnects the application, analyze the cause of the disconnection based on the filtered logs. If needed, refer to the obproxy source code.
If the application is disconnected by an observer, obtain the OBServer node and observer session ID from the filtered log entries and then proceed with the steps outlined in Log in to the corresponding OBServer node of the application by using SSH.
If the application is disconnected by network equipment (such as F5, SLB, or other network devices) between the application and the ODP node, verify the status of this network equipment.
Log in to the corresponding OBServer node of the application by using SSH
Use the
sshcommand to log in to the OBServer node.Run the
pscommand to view the startup time of the observer process.Confirm whether the application was restarted during the error period.
If the application was restarted, the issue is likely caused by the OBServer node restart.
If the application was not restarted, proceed with the next step.
If the application was not restarted, run the
cdcommand to navigate to the log directory.As an example, if the installation directory of OceanBase Database is
/home/admin/oceanbase, replace the actual path in the code.cd /home/admin/oceanbase/logRun the following commands to filter out the
session idlogs.grep "observer session id" observer.loggrep "observer session id" observer.log.xxxAnalyze the logs to determine the disconnection issue.
- If the observer actively disconnects the application, analyze the cause of the disconnection based on the logs. If needed, refer to the observer source code.
- If the application is disconnected by network equipment between the ODP node and the observer, verify the usage of this network equipment.
Typical cases
In a production environment, business transactions experience timeouts due to
read time outerrors.Applications periodically lose connections due to idle connection timeouts.
When an application uses ODP-Sharding to connect to OceanBase Database, it encounters the error
Server connection execute error: Read timed out.During a business commit in OceanBase Database, the error
Transaction resolution unknownis returned.When connecting to an OceanBase cluster through obproxy or directly to OceanBase Database, the error
Access denied for user 'xxxx'@'xxxx' (using password: YES)is returned.When directly connecting to a regular tenant in OceanBase Database, the error
ERROR 5150 (HY000): Tenant not in this serveris returned.When connecting to an OBServer node of OceanBase Database V4.x, the error
unexpected end of stream, read 0 bytes from 4occasionally appears.
When a client executes an SQL statement, it directly returns the error
Lost connection to MySQL server during query. This error is usually caused by an exception on the server side, leading to the connection being interrupted.When connecting to OceanBase Database in MySQL mode, the connection is disconnected and the error
ERROR 2013is returned.