THREAD DUMP ANALYSIS PATTERN – ALL ROADS LEAD TO ROME

Table of Contents

Description

If several threads in a thread dump end up working on one single method, then it might be of concern. Most of the times, if there is a problem (say poorly responding data source, un-relinquished lock, infinite looping threads …), then a significant number of threads will end up in one single method. That particular method has to be analyzed in detail.

Example

This application was connecting with Apache Cassandra NoSQL Database. The application uses DataStax java driver to connect with Cassandra. DataStax has a dependency on the netty library. To be specific following are the libraries that application uses:

cassandra-driver-core-2.0.1.jar
netty-3.9.0.Final.jar

This application all of sudden ran into ‘java.lang.OutOfMemoryError: unable to create new native thread‘. When thread dump was taken on the application, around 2460 threads were in ‘runnable’ state stuck in the method: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method). Below is the stack trace of one of the thread:

"New I/O worker #211" prio=10 tid=0x00007fa06424d000 nid=0x1a58 runnable [0x00007f9f832f6000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
	- locked  (a sun.nio.ch.Util$2)
	- locked  (a java.util.Collections$UnmodifiableSet)
	- locked  (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
	at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:722)

Oh boy! This is way too many threads. All of them are netty library threads. Apparently, the issue turned out that Cassandra NoSQL DB ran out of space. This issue was cascading in the application as OutOfMemoryError. When more space was allocated in Cassandra DB, the problem went away.

Thus always look out for the method(s) where most of the threads are working.

Why named as ‘All roads lead to Rome’?

‘All Roads lead to Rome’ is a famous proverb to indicate different paths can take to one final end. Similarly, when there is a problem, there is a high chance that several threads will finally end up in one problematic method.

5 thoughts on “THREAD DUMP ANALYSIS PATTERN – ALL ROADS LEAD TO ROME”

Add yours

DoodleDo
January 19, 2021 at 2:53 am
How did you connect the dots from netty library threads to Cassandra DB ?
- Ram Lakshmanan
  January 19, 2021 at 7:29 pm
  Hello @Doodle! Indeed very good question. In several cases stacktrace itself will give indication what Database/external system thread is attempting to connect. In this case as you have rightly pointed out we only see the netty stack trace. When this stacktrace was presented to application development team, they confirmed that they were using netty library only to cassandra DB.
kaustubh
May 25, 2021 at 12:43 pm
How did you connect the dots from netty library threads to Cassandra DB ?
I did not understand your answer, could you please elaborate more.
- Ram Lakshmanan
  May 29, 2021 at 11:45 pm
  Hello kaustubh!
  Apparently in this application netty library was used in only one place. It was used to connect cassandra Database. Thus when netty library threads was stuck, it became it became clear it was stuck waiting for response from cassandra database. In this case you need to have some application specific knowledge to identify the root cause.

1 Pingback

THREAD DUMP ANALYSIS API – Fast thread

Fast thread

Universal Java Thread Dump Analyzer