Introduction
Sorting is one of the fundamental techniques used in solving problems, especially in those related to writing and implementing efficient algorithms.
Usually, sorting is paired with searching - meaning we first sort elements in the given collection, then search for something within it, as it is generally easier to search for something in a sorted, rather than an unsorted collection, as we can make educated guesses and impose assumptions on the data.
There are many algorithms that can efficiently sort elements, but in this guide we'll be taking a look at how to implement Heap Sort in Java.
In order to understand how Heap Sort works, we first need to understand the structure it's based on - the heap. In this article we'll be talking in terms of a binary heap specifically, but with minor adjustments the same principals can be generalized to other heap structures as well.
We'll be doing another implementation without heaps - but rather, PriorityQueue
s, which boil the algorithm down to a single line.
Heap as a Data Structure
A heap is a specialized tree-based data structure which is a complete binary tree that satisfies the heap property, that is, for each node all of its children are in a relation to it. In a max heap, for a given parent P and a child C, the value of P is greater and or equal to the value of the child C.
Analogously, in a min heap, the value of P is lesser than or equal to the value of it's child C. The node at the "top" of the heap (i.e. the node that has no parents) is called the root.
Here's an example of a min heap (left) and a max heap (right):
As we mentioned earlier, we see the heap as a tree-based data structure. However, we'll represent it with a simple array and just define how each node (child) relates to its parent. Assuming our array begins from an index 0
, we can represent the max heap from the illustration above with the following array:
53, 25, 41, 12, 6, 31, 18
We can also explain this representation as reading the graph level by level, from left to right. Essentially, we have defined some kind of a relation between a parent node and a child node.
For the k-th
element of the array, we can find it's children on the positions 2*k+1
and 2*k+2
, assuming the indexing starts from 0
. Similarly, we can find the parent of the k-th
element on the position (k-1)/2
.
Earlier we mentioned that heap is a complete binary tree. A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled and all nodes are left-aligned.
Note: A complete binary tree can be the same as a full binary tree, but at its core is a different concept, where a full binary tree represents a tree in which every node other than the leaves has exactly two children.
To explain the concept of a complete binary tree a bit further let's look at an example of the max heap from the illustration earlier. If we remove the nodes 12
and 6
we get the following binary tree:
This tree will be represented in an array as:
53, 25, 41, -, -, 31, 18
We can see that this isn't a complete binary tree, since the nodes on level 2
(if the root node is at level 0
), aren't left-aligned. While on the other hand, the following binary tree would represent a complete binary tree:
The array for this tree would be:
53, 25, 41, 12, 6
From the short example above, we can see that intuitively a complete binary tree is represented with an array that has no "gaps" in it, that is, the positions we represented in the first array above as -
.
Continuing on our explanation of the heap - the process of inserting and deleting elements from it is a crucial step in Heap Sort.
Note: We'll be focusing on a max heap, but keep in mind that everything that applies to the max heap also applies to the min heap as well.
Inserting an Element Into the Max Heap
Using the same max heap we previously had, let's say we want to add element 60
. On first look, it's evident that 60
would be the largest element in our heap, so it should become the root element. But that raises another question: how do we simultaneously keep the form of a complete binary tree and add 60
at the same time?
Let's begin by placing the element at the last position in our heap array, and get something like this:
// 0 1 2 3 4 5 6 7
53, 25, 41, 12, 6, 31, 18, 60
The numbers in the row above represent the index positions of the array
As discussed earlier, children of the k-th
node are located at positions 2*k+1
and 2*k+2
, while the parent of every node is at (k-1)/2
. Following the same pattern, 60
would be a child of 12
.
Now, this disturbs the form of our max heap, as comparing and checking if 60
is lesser than or equal than 12
yields a negative answer. What we'll do is swap these two, as we're sure that there are no lesser numbers than 60
down the binary tree, as 60
was a leaf.
After the swap, we get the following:
// 0 1 2 3 4 5 6 7
53, 25, 41, 60, 6, 31, 18, 12
We repeat the same step as earlier until 60
is at its right spot. The parent element of 60
would now be 25
. We swap these two, after which the parent element of 60
is 53
, after which we swap them as well, ending up with a max heap:
// 0 1 2 3 4 5 6 7
60, 53, 41, 25, 6, 31, 18, 12
Deleting an Element From the Max Heap
Now, let us discuss removing an element. We'll be using the same max heap as earlier (without the addition of 60
). When talking about removing an element from the heap, the standard deletion operation implies that we should be only removing the root element. In the case of
max heap, this is the largest element, and in the case of min heap the smallest.
Removing an element from the heap is as simple as removing it from the array. However, this creates a new problem as the removal creates a "gap" in our binary tree, making it not complete.
Luckily for us, the solution is just as simple - we replace the deleted root element with the element that's farthest-right on the lowest level in the heap. Doing this guarantees us that we'll have a complete binary tree once again, but yet again creates a new potential problem: whilst our binary tree is now complete, it may not be a heap. So how do we go about solving this?
Let's discuss removing an element on the same max heap as earlier (before adding 60
). After we remove our root, and we move our farthest-right element in its spot, we have the following:
// 0 1 2 3 4 5 6
18, 25, 41, 12, 6, 31
Note: The element at the position 6 is left empty on purpose - this will be important later.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Represented like this, our array isn't a max heap. What we should do next is compare 18
to it's children, specifically to the bigger of the two, and in this case that is 41
. If the bigger of the two children is larger than the parent, we swap the two.
After doing this, we get the following array:
// 0 1 2 3 4 5 6
41, 25, 18, 12, 6, 31
As 18
is now at the position 2
, it's only child is 31
, and since the child is yet again bigger than the parent, we swap them:
// 0 1 2 3 4 5 6 41, 25, 31, 12, 6, 18
And just like that we have a max heap again!
Time Complexity of Insertion and Deletion
Let's take a look at the time complexity of inserting and deleting elements from a heap before implementing the algorithm. Since we're working with a binary tree-like structure it's natural that the time complexity of both the insertion and deletion is O(log n)
, where n
represents the size of our array.
This is because for a binary tree of height h
, given the binary nature of the heap - when traversing down the tree, you'll only even get to choose between two options, cutting down the possible paths by two on each step. In the worst-case, when traversing down to the bottom of the tree - the height of the tree, h
, will be logn
.
With this we wrap up the explanation about heap as a data structure and move on to the main topic of the article - Heap Sort.
Heap Sort in Java
By taking advantage of the heap and its properties, we've expressed it as an array. We can just as easily max heapify any array. Max heapify-ing is a process of arranging the elements in a correct order so they follow the max heap property. Similarly, you can min heapify an array.
For each element, we need to check if any of its children are smaller than itself. If they are, swap one of them with the parent, and recursively repeat this step with the parent (because the new large element might still be bigger than its other child). Leaves have no children, so they're already max heaps on their own.
Let's look at the following array:
// 0 1 2 3 4 5 6
25, 12, 6, 41, 18, 31, 53
Let's quickly run the 'heapify' algorithm through it and make a heap out of this array, manually, and then implement the code in Java to do that for us. We start from the right and go all the way to the left:
25 12 *6* 41 18 **31** **53**
Since both 31 > 6
and 53 > 6
, we take the bigger of the two (in this case 53
) and swap it with their parent, and we get the following: 25 12 53 41 18 31 6.
25 *12* 6 **41** **18** 31 6
Once again, 18 > 12
and 41 > 12
, and since 41 > 18
, we swap 42
and 12
.
*25*, **41**, **53** 12, 18, 31, 6
In this last step of the way, we see that 41 > 25
and 53 > 25
, and since 53 > 41
, we swap 53
and 25
. After that, we recursively 'heapify' for 25
.
53, 41, *25*, 12, 18, **31**, **6**
31 > 25
, so we swap them.
53, 41, 31, 12, 18, 25, 6
We got a max heap! This process may seem daunting, though - when implemented in code, it's actually fairly simple. The process of "heapify-ing" is crucial to Heap Sort, which follows three steps:
1. Build a max heap array using the input array.
2. Since the max heap stores the largest element of the array at the top (that is, the beginning of the array), we need to swap it with the last element within the array, followed by reducing the size of the array (heap) by 1
. After that, we 'heapify' the root.
3. We repeat step 2 as long as the size of our heap is bigger than 1.
With a good intuition of how the algorithm works, we can get to implementing it. Generally, since we'll be calling a heapify()
method multiple times - we implement it separately from the heapsort()
method, and call it within it.
This makes the implementation cleaner and easier to read. Let's start out with the heapify()
method:
public static void heapify(int[] array, int length, int i) {
int left = 2 * i + 1;
int right = 2 * i + 2;
int largest = i;
if (left < length && array[left] > array[largest]) {
largest = left;
}
if (right < length && array[right] > array[largest]) {
largest = right;
}
if (largest != i) {
int tmp = array[i];
array[i] = array[largest];
array[largest] = tmp;
heapify(array, length, largest);
}
}
The heapify()
method is what does most of the heavy lifting, and it just consists of three if
statements. The flow of the Heap Sort algorithm itself is fairly simple as well, and relies mainly on heapify()
:
public static void heapSort(int[] array) {
if (array.length == 0) {
return;
}
int length = array.length;
// Moving from the first element that isn't a leaf towards the root
for (int i = length / 2 - 1; i >= 0; i--) {
heapify(array, length, i);
}
for (int i = length - 1; i >= 0; i--) {
int tmp = array[0];
array[0] = array[i];
array[i] = tmp;
heapify(array, i, 0);
}
}
That's about it! We can now supply an array to the heapSort()
method, which sorts it in-place:
public static void main(String[] args){
int[] array = {25, 12, 6, 41, 18, 31, 53};
heapSort(array);
System.out.println(Arrays.toString(array));
}
This results in:
[6, 12, 18, 25, 31, 41, 53]
Implementing Heap Sort with a Priority Queue
A Priority Queue is a data structure that is actually a specific type of a queue, in which elements are added with a priority one by one, hence the name. The removal of elements begins with the one with the highest priority. The definition itself is really similar to that of a heap, so it's only natural that you can also implement Heap Sort using this very convenient data structure.
Java has a built-in PriorityQueue
residing in the util
package:
import java.util.PriorityQueue;
The PriorityQueue
has quite a few of it's own and inherited methods from the Queue
interface, but for our purposes we'll only need to use a few:
boolean add(E e)
- inserts the elemente
into the priority queue.E poll()
- retrieves and removes the head of the priority queue, or returnsnull
if it's empty.int size()
- returns the number of elements in the priority queue.
With these, we can really implement Heap Sort through a single
while()
loop.
First of all, we'll create and add the elements to the priority queue, after which we simply run a while
loop as long as our priority queue pq
has at least 1
element within it. In every single iteration, we use the poll()
method to retrieve and remove the head of the queue, after which we print it out and produce the same output as earlier:
Queue<Integer> pq = new PriorityQueue<>();
int[] array = new int[]{25, 12, 6, 41, 18, 31, 53};
Arrays.stream(array).forEach(element -> pq.add(element));
while(pq.size() > 0){
System.out.print(pq.poll() + " ");
}
This results in:
6 12 18 25 31 41 53
Time complexity of Heapsort
Let's discuss the time complexity of both approaches we've covered.
We've discussed earlier that adding and removing elements from a heap requires O(log n)
time, and since our for loop runs n
times where n
is the number of the elements in the array, the total time complexity of 'Heapsort' implemented like this is O(n log n)
. On the other hand, both adding and removing the elements from a priority queue takes up O(log n)
as well, and doing this n
times also produces O(n log n)
time complexity.
What about space complexity? Well, since in both approaches we're only using the starting array to sort the array, that means the additional space required for Heap Sort is O(1)
, making Heap Sort an in-place algorithm.
Conclusion
In conclusion, this article has covered both the theory and the implementation behind the Heap Sort algorithm. We've started out with an explanation of how it works, with an intuitive manual iteration, followed by two implementations.
While not as fast as compared to something like Quick Sort or Merge Sort, Heap Sort is often used when the data is partially sorted or when there's a need for a stable algorithm. The in-place aspect of Heap Sort also allows us for better memory usage, when memory is of concern.