What does java.net.SocketInputStream.socketRead0() API do? Why is it showing up frequently in several thread dumps? Why is it reported in thread dump analysis tools like fastThread.io? Is it something that I need to be concerned about? What are the potential solutions to this problem? Let’s find answers to these questions.
What does SocketInputStream.socketRead0() API do?
It’s always easy to remember new concepts through real life analogies. Say suppose you are calling your wife or girlfriend on the phone. Once call gets connected, if she is in happy/good mood immediately you will get response “Hello Honey (or darling or sweetie), How are you?”. :-). If your call got connected when she is in middle of doing some work (say she is in her office, picking up kids, Gym…) there might be delay in her response to say “Hello Honey (or darling or sweetie) ….”. Suppose your call got connected when she is in angry/bad mood then response can be unpredictable. God only knows. You might get response after several seconds/minutes (or even call can get hanged up :-). So, the time you are waiting since the moment call got connected until the moment you hang-up the call is basically socketRead0() API. (Thanks to Douglas Spath from IBM for giving this beautiful example to explain this SocketRead0() API.)
Your application might be interfacing with multiple remote applications through various protocol likes: SOAP, REST, HTTP, HTTPS, JDBC, RMI… all connections goes through JDK java.net layer to perform lower TCP-IP/Socket operations. In this layer, SocketInputStream.socketRead0() API is used to read and receive the data the remote application. Some remote applications may respond immediately, some might take time to respond, some application may not respond at all. Until your application reads the response data completely, your application thread will be stuck in this java.net.SocketInputStream.socketRead0() API.
Sample Thread dump stacktrace
Below are some sample stacktrace that shows the threads that are stuck in ‘SocketInputStream.socketRead0’ API. You can notice irrespective of the protocol threads to get stuck on SocketInputStream.socketRead0() API.
"RMI TCP Connection(2)-192.xxx.xx.xx" daemon prio=6 tid=0x000000000a3e8800 nid=0x158e50 runnable [0x000000000adbe000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) - locked (0x00000007ad784010) (a java.io.BufferedInputStream) at java.io.FilterInputStream.read(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
Fig: RMI thread stuck in SocketInputStream.socketRead0() API
"Thread-18" id=48 idx=0x9c tid=11696 prio=5 alive, in native, daemon at jrockit/net/SocketNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BIII)I(Native Method) at jrockit/net/SocketNativeIO.socketRead(SocketNativeIO.java:32) at java/net/SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I(SocketInputStream.java) at java/net/SocketInputStream.read(SocketInputStream.java:129) at java/net/ManagedSocketInputStreamHighPerformanceNew.read(ManagedSocketInputStreamHighPerformanceNew.java:100) at java/net/SocketInputStream.read(SocketInputStream.java:182) at java/net/ManagedSocketInputStreamHighPerformanceNew.read(ManagedSocketInputStreamHighPerformanceNew.java:55) at oracle/ons/InputBuffer.getNextString(InputBuffer.java:137) at oracle/ons/ReceiverThread.run(ReceiverThread.java:295) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
Fig: Oracle Database connection stuck in SocketInputStream.socketRead0() API
"AMQP Connection 192.xx.xxx.xxx:5672" prio=5 RUNNABLE java.net.SocketInputStream.socketRead0(Native Method) java.net.SocketInputStream.socketRead(SocketInputStream.java:116) java.net.SocketInputStream.read(SocketInputStream.java:170) java.net.SocketInputStream.read(SocketInputStream.java:141) java.io.BufferedInputStream.fill(BufferedInputStream.java:246) java.io.BufferedInputStream.read(BufferedInputStream.java:265) java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288) com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95) com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139) com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:536) java.lang.Thread.run(Thread.java:745)
Fig: RabbitMQ stuck in SocketInputStream.socketRead0() API
"Thread-2012" id=218 idx=0x09c tid=196 prio=10 alive, in native, daemon java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:140) at com.ibm.db2.jcc.t4.z.b(z.java:199) at com.ibm.db2.jcc.t4.z.c(z.java:289) at com.ibm.db2.jcc.t4.z.c(z.java:402) at com.ibm.db2.jcc.t4.z.v(z.java:1170) at com.ibm.db2.jcc.t4.cb.b(cb.java:40) at com.ibm.db2.jcc.t4.q.a(q.java:32) at com.ibm.db2.jcc.t4.sb.i(sb.java:135) at com.ibm.db2.jcc.am.yn.gb(yn.java:2066) at com.ibm.db2.jcc.am.zn.pc(zn.java:3446) at com.ibm.db2.jcc.am.zn.b(zn.java:4236) at com.ibm.db2.jcc.am.zn.fc(zn.java:2670) at com.ibm.db2.jcc.am.zn.execute(zn.java:2654) at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.execute(WSJdbcPreparedStatement.java:618) at com.mycompany.myapp.MyClass.executeDatabaseQuery(MyClass.java:123)
Fig: IBM DB2 statement execution stuck in SocketInputStream.socketRead0() API
If you a thread gets stuck in SocketInputStream.socketRead0 API and doesn’t recover from it for a longer period, then customer whoever originated the transaction will not see any response in his screen. It can puzzle, confuse the user. If multiple threads get stuck in SocketInputStream.socketRead0 API and doesn’t recover for a longer period it can pose serious availability concerns to your application.
Here with we are outlining few potential solutions to address this problem:
1. Instrument timeout settings
1.1. JVM Network settings
1.4. Oracle JDBC
2. Validate Network connectivity
3. Work with remote application
4. Non-blocking HTTP client
# 1. Instrument timeout settings
Most applications don’t set appropriate timeout settings to recover from SocketInputStream.socketRead0, thus they end up stuck in this API for a prolonged period. Setting appropriate timeout is a great self-defensive mechanism that every application should do. Here are few timeout settings you can apply to your application as you may see the fit:
1.1. JVM Network settings
You can pass these two powerful timeout networking properties that can be globally applicable to all protocol handlers that uses java.net.URLConnection:
sun.net.client.defaultConnectTimeout specifies the timeout (in milliseconds) to establish the connection to the host. For example, for http connections it is the timeout when establishing the connection to the http server. For ftp connection it is the timeout when establishing the connection to ftp servers.
sun.net.client.defaultReadTimeout specifies the timeout (in milliseconds) when reading from input stream when a connection is established to a resource.
More details about JVM network settings can be found here.
If you are directly programming with Sockets, you may consider setting the timeout on the socket by invoking the setSoTimeout() API.
To this API you can pass the timeout value in milliseconds. If remote application doesn’t respond back within the specified timeout period, java.net.SocketTimeoutException will be thrown. This exception will free-up the thread, allowing it to work on other calls. Note: If timeout value is passed as 0, then it’s interpreted as an infinite timeout, it means thread will never timeout.
If you are using JDBC (Java DataBase Connectivity) to connect, you may consider setting the timeout value using the setQueryTimeout() API.
This API will set the number of seconds the JDBC driver will wait for getting results from database. If the limit is exceeded, SQLTimeoutException is thrown. JDBC driver applies this limit to the execute, executeQuery() and executeUpdate() methods. By default, there is no limit on the amount of time allowed for a running statement to complete.
1.4. Oracle JDBC
If you are connecting with Oracle database and seeing lot of threads stuck on SocketInputStream.socketRead0() API, you may consider passing -Doracle.jdbc.ReadTimeout system property.
You need to pass above argument during application startup. Value needs to be specified in milliseconds.
If your application happens to be running on IBM Websphere, you can consider setting following properties:
a. Administrator can set the webSphereDefaultQueryTimeout data source custom property.
b. A second property, syncQueryTimeoutWithTransactionTimeout, can also be set as a data source custom property. With this set, WebSphere will calculate the time remaining before the transaction times out (if running within a global transaction) and set the query timeout to this value automatically.
You can also set the “readTimeout” property in the HTTP Transport Policy Set for the Web Service client or set “timeout” on the org.apache.axis2.context.MessageContext in the application code.
# 2. Validate Network connectivity
Threads not recovering from SocketInputStream.socketRead0 API can also originate because of issues in network connectivity or load balancers. We have seen in the past sometimes remote application may not be issuing appropriate ACK or FIN packets. You might have to engage network engineers or cloud hosting providers support team to troubleshoot the issue.
On your end, you may use TCP/IP tracing tools such as Wireshark to see packets sent in the network between you and the remote application. It can help you to narrow whether if the problem is on your side of the network or on the other side of the network.
# 3. Work with remote application
Sometimes it could be quite possible that transactions might be slowing down because of performance problems in the remote application. In those circumstances, you need to bring it to remote application’s awareness of the slow down and work with them to fix the problem.
# 4. Non-blocking HTTP client
You can also consider using non-blocking HTTP client libraries like Grizzly or Netty which do not have blocking operations to hang a thread. But this solution is more a strategic solution, which involves code changes & thorough testing.
Note, this a comprehensive list but maybe not be a complete list of potential solutions. If you have additional solutions and timeout settings that you would like to add to this blog, please drop us a note in the below feedback section. We will be glad to update this blog with your recommendation(s).