Everyone knows about what sorting is. There are so many computer algorithms that have emerged to support sorting. Some of the well-known algorithms are quick sort, heap sort, merge sort etc.

All these sorting algorithms are working based on sequential sorting. Means, a single thread is used to perform the complete sorting operation. However, this approach is quite useful when the requirement is to handle low to medium level of data.For a huge dataset (in billions), sequential sorting does not scale well. This is where we need to use ‘Parallel sort’.

Java 8 Support

Since Java 8 onwards, the collections API implemented parallel sorting with the help of Streams. Arrays and List are the popular data structures that benefited from parallel sorting and they directly started supporting it. But, prior to Java 8, parallelism is not directly supported.

Streams and Parallelism

Java has provided parallelism support in the Stream API. For example, java.util.List has a stream() method that will support parallelism.

You can create a parallel stream by calling Stream.parallel()

Listlist = new ArrayList<>();


The above code will prepare a parallel stream for further processing. Similarly, Arrays can also be used this way. We can make parallel streams from the arrays.

Arrays and parallel sort

Java 8 onwards, there are a few  ways to sort an array in a concurrent manner. They are listed below:

  1. Arrays.parallelSort(array object)
  2. Arrays.stream(array object).parallel().sorted()
  3. Arrays.stream(array object).parallel().sorted(Comparator.comparing(int ->{

Similarly, there is a method for sorting a list as well:

  1. list.stream().parallel().sorted()
  2. list.stream().parallel().sorted(Comparator.comparing(int->{

However, there are some differences when the above methods are used. The Arrays.parallelSort() and Arrays.stream(..).parallel().sorted() can only be used to sort in ascending order. But, Arrays.stream(..).parallel().sorted(Comparator.reverseOrder())  can be used to sort an array reverse.

How does parallel sort work?

The parallelSort() method uses the fork and join approach in which the arrays are divided into smaller units until each unit is easily manageable and then sorted individually. Then the smaller units are joined together and this entire operation happens in parallel.This approach uses multithreading which makes it more efficient and fast.

Parallelism Value

By default, when a parallel sorting is performed, the ForkJoinPool creates a default thread pool. The size of the thread pool is determined is based on the number of CPUs available in the current machine.In most of the circumstances, this value is sufficient.However, we can fine tune this value based on the need of the application.In that case, it is possible to control the size of the thread pool by passing “java.util.concurrent.ForkJoinPool.common.parallelismproperty with a value with the help of System.setProperty() method.

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "10");

The above code will set the parallelism property to 10 and the ForkJoinPool framework will generate 10 worker threads. You will get to know this behavior later in the article.

Benchmark for parallel sort

Let’s develop a small piece of code that will demonstrate the performance of parallel sort. In this code, we will populate a list with 10 million entries and will sort it without using parallelism. Next, we will sort it using parallelism and compare the results.

First, sort the list using sequentially (without using parallelism):

Fig 1: Sorting the list without using parallelism

It takes 14389 milliseconds to perform the operation.

Next, sort the list using parallel stream.

Fig 2: Sorting the list using parallelism

The time taken for this operation is 3906 milliseconds.

It is just taking half of the time required for sequential sorting.

Why this is taking less amount of time because parallel sort employs multiple threads for the sorting operation as compared to sequential sorting. Remember in case of sequential sort, only ONE thread is used for entire sorting operation. This is the main difference why parallel sort is ahead of sequential sort.

Note: The time will vary on different systems.

What happens under the JVM?

Now we are going to perform thread dump analysis to understand what happens inside the JVM. This process will give you a complete picture about the threads that are used to perform both sequential and parallel operations. In case of parallel sort, more threads will be used.

We will analyze the thread dumps for the sequential and parallel sorts to find out this difference.

Thread Dump analysis in sequential sorting

Fig 3: fastThread tool highlighting 32 RUNNABLE threads

Fig 4: Thread group section in fastThread report

Above is the thread dump analysis report that was captured from the sequential sorting program. In the above report, you can see that there are 32 threads that are in RUNNABLE state. In case of sequential sort, Java is NOT using the ForkJoinPool API. Please take a look at the Thread Group section as well.

You can access the full report from here.

Thread Dumps in Parallel sorting

In case of parallelism, Java uses the ForkJoinPool API that will split the tasks into smaller chunks by an operation called “forking” and join the results. This brings a lot of performance boost to the complex operations. In parallel sorting, the ForkJoinPool API internally maintains a thread pool which is used in the case of parallel sorting.

But we can control this thread creation process by setting a system property known as “java.util.concurrent.ForkJoinPool.common.parallelism” which was mentioned in the previous section of the article. Based on the value of this property, the ForkJoinPool API will generate as many threads. In the above code, you can see the value is 10, which means the API will generate a pool of ten working threads.

Fig 5: fastThread tool highlighting 42 RUNNABLE threads

Fig 6: fastThread tool highlighting newly created ‘ForkJoinPool.commonPool’

The Thread Group section of the report reveals the number of worker threads that are generated by the ForkJoinPool API. There are 10 threads in a RUNNABLE state. You can see that in case of parallel sorting, there are 42 threads used as opposed to 32 for sequential sort.

Let’s talk about few words about ForkJoinPool API. This framework is used to perform the parallel operations in Java. This is the default implementation of Java when a parallel operation is performed. However,  this API is available from Java 8 onwards. You can see from the thread group image that it is generating around 10 threads for this operation.

You will get a complete overview of the report here.


Parallelism makes the code more scalable and efficient. In the above code with parallel sort, there was no need for  developing complex algorithms and still achieves the same result. This will remove the overhead of developing complicated logic for parallelism.