## Introduction

*Heap Sort* is another example of an efficient sorting algorithm. Its main advantage is that it has a great worst-case runtime of *O(n*logn)* regardless of the input data.

As the name suggests, Heap Sort relies heavily on the *heap* data structure - a common implementation of a *Priority Queue*.

Without a doubt, Heap Sort is one of the simplest sorting algorithms to implement and coupled with the fact that it's a fairly efficient algorithm compared to other simple implementations, it's a common one to encounter.

## Heap Sort

Heap Sort works by "removing" elements from the heap part of the array one-by-one and adding them to the sorted part of the array. Before we get further into the explanation and revisit the heap data structure, we should mention a few attributes of Heap Sort itself.

It is an *in-place algorithm*, meaning that it requires a constant amount of additional memory, i.e. the memory needed doesn't depend on the size of the initial array itself, other than the memory needed to store that array.

For example, no copies of the original array are necessary, and there is no recursion and recursive call stacks. The simplest implementation of Heap Sort usually uses a second array to store the sorted values. We will be using this approach since it's a lot more intuitive and easy to follow in code, but it can be implemented completely *in-place*.

Heap Sort is *unstable*, meaning that it does not maintain the relative order of elements with equal values. This isn't an issue with primitive types (like integers and characters...) but it can be a problem when we sort complex types, like objects.

For example, imagine we have a custom class `Person`

with the `age`

and `name`

fields, and several objects of that class in an array, including a person called "Mike" aged 19 and "David", also aged 19 - appearing in that order.

If we decided to sort that array of people by age, there would be no guarantee that "Mike" would appear before "David" in the sorted array, even though they appeared in that order in the initial array. It *can* happen, but it's not guaranteed.

Fun fact: Heap Sort is the sorting algorithm of choice in the Linux Kernel

## The Heap Data Structure

Heaps are one of the most popular and heavily used data structures in computer science - not to mention very popular during Software Engineering interviews.

We'll talk of heaps keeping track of the smallest element (min-heap), but they can just as easily be implemented to keep track of the largest element (max-heap).

Simply put, a min-heap is a tree-based data structure in which every node is smaller that all of its children. Most often a binary tree is used. Heaps have three supported operations - `delete_minimum()`

, `get_minimum()`

, and `add()`

.

You can *only* delete the first element in the heap, after which it's "re-sorted". Heaps "re-sort" themselves after an element is added or removed, so that the the smallest element is always in the first position.

**Note:** This in no way means that heaps are sorted arrays. The fact that every node is smaller than its children isn't enough to guarantee that the *whole heap* is in ascending order.

Let's look at an example of a heap:

As we can see, the above example does fit the description of a heap but is not sorted. We won't go into details of the heap implementation since that is not the focus of this article. The crucial advantage of the heap data structure we leverage when using it in Heap Sort is that the next *smallest element is always the first element in the heap*.

Note: Thanks to the way heaps sort elements after an element is removed, the complexity of the next smallest element moving to the first position, while keeping the array a heap still, takes

O(logn)time, which is a highly efficient operation.

## Implementation

### Sorting Arrays

Python provides methods for creating and using heaps so we don't have to implement them ourselves:

`heappush(list, item)`

: Adds an element to the heap, and re-sorts it afterward so that it remains a heap. Can be used on an empty list.`heappop(list)`

: Pops (removes) the first (smallest) element and returns that element. The heap remains a heap after this operation, so we don't have to call`heapify()`

.`heapify(list)`

: Turns the given list into a heap. It is worth noting that this method exists even though we won't be using this since we don't want to change our original array.

Now that we know this, the implementation for Heap Sort is fairly straight-forward:

```
from heapq import heappop, heappush
def heap_sort(array):
heap = []
for element in array:
heappush(heap, element)
ordered = []
# While we have elements left in the heap
while heap:
ordered.append(heappop(heap))
return ordered
array = [13, 21, 15, 5, 26, 4, 17, 18, 24, 2]
print(heap_sort(array))
```

Output:

```
[2, 4, 5, 13, 15, 17, 18, 21, 24, 26]
```

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually *learn* it!

As we can see, the heavy lifting is done with the heap data structure, all we have to do is add all the elements we need and remove them one by one. It's almost like a coin counting machine that sorts the inputted coins by their value and we can take them out afterwards.

### Sorting Custom Objects

Things get a little more complicated when using custom classes. Usually, we advise against overriding comparison operators in classes for the purpose of using our sorting algorithms for them, and instead suggest rewriting the algorithm so that it takes a lambda function comparator instead.

However, since our implementation relies on the built-in heap methods, we can't do that here.

Python does provide the following methods:

`heapq.nlargest(*n*, *iterable*, *key=None*)`

: Returns a list with the*n*largest elements from the dataset defined by`iterable`

.`heapq.nsmallest(*n*, *iterable*, *key=None*)`

: Returns a list with the*n*smallest elements from the dataset defined by`iterable`

.

Which we could use to simply get `n = len(array)`

largest/smallest elements but the methods themselves do not use Heap Sort and are essentially equivalent to just calling the `sorted()`

method.

The only solution we have left for custom classes is to actually override the comparison operators. This sadly limits us to only one type of comparison per class. In our example it limits us to sorting `Movie`

objects by year.

However, it does let us demonstrate using Heap Sort on custom classes. Let's go ahead and define the `Movie`

class:

```
from heapq import heappop, heappush
class Movie:
def __init__(self, title, year):
self.title = title
self.year = year
def __str__(self):
return str.format("Title: {}, Year: {}", self.title, self.year)
def __lt__(self, other):
return self.year < other.year
def __gt__(self, other):
return other.__lt__(self)
def __eq__(self, other):
return self.year == other.year
def __ne__(self, other):
return not self.__eq__(other)
```

And now, let's slightly modify our `heap_sort()`

function:

```
def heap_sort(array):
heap = []
for element in array:
heappush(heap, element)
ordered = []
while heap:
ordered.append(heappop(heap))
return ordered
```

And finally, let's instantiate a few movies, put them in an array, and then sort them:

```
movie1 = Movie("Citizen Kane", 1941)
movie2 = Movie("Back to the Future", 1985)
movie3 = Movie("Forrest Gump", 1994)
movie4 = Movie("The Silence of the Lambs", 1991);
movie5 = Movie("Gia", 1998)
array = [movie1, movie2, movie3, movie4, movie5]
for movie in heap_sort(array):
print(movie)
```

Output:

```
Title: Citizen Kane, Year: 1941
Title: Back to the Future, Year: 1985
Title: The Silence of the Lambs, Year: 1991
Title: Forrest Gump, Year: 1994
Title: Gia, Year: 1998
```

## Comparison to Other Sorting Algorithms

One of the main reasons Heap Sort is still used fairly often, even though it's often outperformed by a well-implemented Quick Sort, is its reliability.

Heap Sort's main advantage here are the *O(n*logn)* upper bound as far as time complexity is concerned, and security concerns. Linux kernel developers give the following reasoning to using Heap Sort over Quick Sort:

Sorting time of Heap Sort is O(n*logn) both on average and worst-case. While qsort is about 20% faster on average, it suffers from an exploitable O(n*n) worst-case behavior and extra memory requirements that make it less suitable for kernel use.

Furthermore, Quick Sort behaves poorly in predictable situations, and given enough knowledge of the internal implementation, it could create a security risk (mainly DDoS attacks) since the bad *O(n ^{2})* behavior could easily be triggered.

Another algorithm that Heap Sort is often compared to is Merge Sort, which has the same time complexity.

Merge Sort has the advantage of being *stable* and intuitively *parallelizable*, while Heap Sort is neither.

Another note is that Heap Sort is slower than Merge Sort in most cases, even though they have the same complexity, since Heap Sort has larger constant factors.

Heap Sort can, however, be implemented much more easily *in-place* than Merge Sort can, so it's preferred when memory is a more important factor than speed.

## Conclusion

As we saw, Heap Sort isn't as popular as other efficient, general-purpose algorithms but its predictable behavior (other than being unstable) make it a great algorithm to use where memory and security are more important than slightly faster run-time.

It's really intuitive to implement and leveraging the built-in functionality provided with Python, all we essentially have to do is put the items in a heap and take them out - similar to a coin counter.