Have you ever encountered circumstance where your applications CPU maxes out and never goes down even if traffoc volume goes down? Did you had to recycle to JVM to remediate the problem? Even if you recycle the JVM, does your CPU start to spike up after some time?

This type of problem surfaces because of one of the following reasons:

  • Repeated Full GC
  • Non-terminating Loops
  • non-synchronized access to java.util.HashMap

Let’s see how to diagnose these scenarios and address them.

Scenario 1: Repeated Full GC

Full GC is an important phase of Garbage Collection process. During this phase, entire JVM is frozen, every single object in the memory is evaluated for garbage collection, naturally, it turns out to be a CPU intensive operation. If application happens to have memory leak, then “Full GC” will start to run repeatedly without reclaiming any memory. When ‘Full GC’ runs repeatedly, CPU will start to spike up and never come down.

repeatedfullgc (1)

Figure: HP Jmeter tool showing repeated runs of Full GC process

Tactical Solution: To resolve the problem completely, memory leak in the application has to be fixed. Resolving memory leaks might take some time. (Of course it goes without saying, you can engage experts like me.to resolve it quickly). Until then below mentioned tactical solution can be implemented to keep the application functioning in production. You need to instrument a script which would monitor garbage collection log file of the application for every 2 minutes. If the script notices more than 3 ‘Full GC’ runs in a 10-minute window, then that particular JVM should be decommissioned from taking production traffic. JVM should be recycled after capturing thread dump and heap dump.  After recycling JVM should be placed back to take active traffic.

Strategic Solution: Using the Heap Dump/Thread Dump root cause  of the problem should be identified & fixed.

Scenario 2: non-terminating loops

Sometimes due to bug in your code or in the 3rd party library that you use – loop constructs (while, for, do..while) may run forever. Consider the scenario below:

1
2
3
4
5
6
7
8
while (myCondition) {
   statement-1;
   statement-1;
   :
   :
   statement-n;
}

Due to certain data condition or bug in the code, ‘myCondition’ may never get satisfied. In such scenario, thread would be spinning infinitely in the while loop. This would cause the CPU to spike up. Unless JVM is restarted, CPU maxing out wouldn’t stop at all.

Solution: When you observe CPU maxing out and utilization not coming go down, you should take 2 thread dumps in a gap of 10 seconds between each thread dump – right when problem is happening. Every thread in “runnable” state in the first taken thread dump should be noted down. Same threads state in the second thread dump should be compared. If in the second thread dump also those threads remain the “runnable” state within the same method, then it would indicate in which part of the code thread(s) are looping infinitely. Once you know which part of the code is looping infinitely then it should be trivial to address the problem.

Example: Below is the excerpt from the thread dump in which ‘Thread-0’ is looping infinitely in the sample application:

1
2
3
4
5
6
"Thread-0" prio=6 tid=0x000000000b583000 nid=0x10adc runnable [0x000000000cb6f000]
   java.lang.Thread.State: RUNNABLE
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.loopForever(NonTerminatingLooper.java:32)
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.method2(NonTerminatingLooper.java:27)
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.method1(NonTerminatingLooper.java:22)
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.run(NonTerminatingLooper.java:16)
Fig: Stack Trace of “Thread-0” in 1st thread dump
1
2
3
4
5
6
"Thread-0" prio=6 tid=0x000000000b583000 nid=0x10adc runnable [0x000000000cb6f000]
   java.lang.Thread.State: RUNNABLE
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.loopForever(NonTerminatingLooper.java:32)
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.method2(NonTerminatingLooper.java:27)
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.method1(NonTerminatingLooper.java:22)
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.run(NonTerminatingLooper.java:16)
Fig: Stack Trace of “Thread-0” in 2nd thread dump which was taken 10 seconds after 1st thread dump.

From the stack trace you could infer “Thread-0” is spinning infinitely in loopForever() method. Below is the source code of the loopForever() method. Here you can notice the non-terminating condition i.e. ‘while (true)’

1
2
3
4
5
6
7
public void loopForever() {
            
   while (true) {
      
      new String("Loop fore ever");
   }
}

Scenario 3: non-synchronized access of java.util.HashMap

When multiple threads tries to access HashMap’s get() and put() APIs concurrently it would cause threads go into infinite looping. This problem doesn’t happen always, but rarely it does happens. Here is a detailed blog which describes about this problem.

Solution: When you observe CPU maxing out and utilization not coming go down, you should take a thread dump – right when problem is happening. You need to see which are threads that are in “runnable” state. If that thread happens to be working on HashMap’s get() or put() API, then it’s indicative that HashMap is causing CPU spike. Now you can replace that HashMap with ConcurrentHashMap.

Example: Below is an excerpt from the thread dump which indicates the infinite looping that is happening in HashMap:

1
2
3
4
5
6
7
"Thread-0" prio=6 tid=0x000000000b583000 nid=0x10adc runnable [0x000000000cb6f000]
   java.lang.Thread.State: RUNNABLE
        at java.util.HashMap.put(HashMap.java:374)
    at com.tier1app.HashMapLooper$AddForeverThread.AddForever(NonTerminatingLooper.java:32)
    at com.tier1app.HashMapLooper$AddForeverThread.method2(NonTerminatingLooper.java:27)
    at com.tier1app.HashMapLooper$AddForeverThread.method1(NonTerminatingLooper.java:22)
    at com.tier1app.NonTerminatingLooper$LoopForeverThread.run(NonTerminatingLooper.java:16)
Advertisements